Quick Start¶
Environment¶
An environment that supports the use of options is already provided with the framework, called simple-intersection, which is an autonomous driving environment consisting of a single intersection with two-lane roads. The objective of the agent (the controllable vehicle) is to reach the other side of the intersection while obeying some defined traffic rules and without crashing. The agent takes rate of change of steering angle and acceleration as an input.
The environment supports the use of options/maneuvers, which model certain favourable behaviours. The number and type of maneuver are up to the user to define, but this framework comes with a few pre-defined maneuvers:
- keeplane (drive on the current lane)
- changelane (change to the other lane)
- stop (come to a stop at a stop sign)
- wait (wait at the stop sign until safe)
- follow (follow the vehicle in front).
Low-level Policies¶
The policy for executing each maneuver is referred to as the low-level policy, which provides the agent with an input depending on the current state. They can be both manually-defined or learned. For the simple-intersection environment, low-level policies learned using reinforcement learning are provided in the framework. Learning is done using low_level_policy_main.py. Use low_level_policy_main.py --help
to view supported arguments and defaults.
usage: low_level_policy_main.py [-h] [--train] [--option] [--test] [--evaluate]
[--saved_policy_in_root] [--load_weights]
[--tensorboard] [--visualize]
[--nb_steps NB_STEPS]
[--nb_episodes_for_test NB_EPISODES_FOR_TEST]
optional arguments:
-h, --help show this help message and exit
--train Train a low-level policy with default settings.
Always saved in the root folder. Always tests after
training
--option the option to train. Eg. stop, keeplane, wait,
changelane, follow. If not defined, trains all options
--test Test a saved low-level policy.
Uses saved policy in backends/trained_policies/OPTION_NAME/ by default
--evaluate Evaluate a saved low-level policy over 100 episodes.
Uses saved policy in backends/trained_policies/OPTION_NAME/ by default
--saved_policy_in_root
Use saved policies in the root of the project rather than
backends/trained_policies/highlevel/
--load_weights Load a saved policy first before training
--tensorboard Use tensorboard while training
--visualize Visualize the training. Testing is always visualized.
--nb_steps NB_STEPS Number of steps to train for. Default is 100000
--nb_episodes_for_test NB_EPISODES_FOR_TEST
The number of episodes to test. Default is 20
Training¶
Run low_level_policy_main.py --train --option=OPTION_NAME
, where OPTION_NAME can be the key of any node defined in config.json to learn the option using reinforcement learning default settings and save the result to root folder. If no option is specified, all options are trained. The training can be customized further using other supported arguments. For example, to train keeplane maneuver and visualize the training, run:
python3 low_level_policy_main.py --train --option=keeplane --visualize
Testing¶
Run low_level_policy_main.py --test --option=OPTION_NAME
along with other supported arguments to test the trained policy. By default, uses the trained policies in /backends/trained_policies. For example:
python3 low_level_policy_main.py --test --option=wait --nb_episodes_for_test=20
Evaluating¶
Run low_level_policy_main.py --evaluate --option=OPTION_NAME
along with other supported arguments to evaluate the trained policy over 100 episodes. By default, uses the trained policies in /backends/trained_policies. For example:
python3 low_level_policy_main.py --evaluate --option=evaluate --saved_policy_in_root
High-level Policy¶
By composing smaller maneuvers together, the objective can be achieved. It is also possible to have just one maneuver, called ‘drive’ and train it to achieve the goal but it is much harder than defining smaller easy-to-achieve maneuvers and composing them together. This composing of the maneuvers is done by the high-level policy. It decides which maneuver to execute according to the state of the environment.
The high-level policy can also be manually-defined or learned. This framework comes with one learned with reinforcement learning for the simple-intersection environment. Learning is done using high_level_policy_main.py. Use high_level_policy_main.py --help
to view supported arguments and defaults.
usage: high_level_policy_main.py [-h] [--train] [--test] [--evaluate]
[--saved_policy_in_root] [--load_weights]
[--tensorboard] [--visualize]
[--nb_steps NB_STEPS]
[--nb_episodes_for_test NB_EPISODES_FOR_TEST]
[--nb_trials NB_TRIALS]
[--save_file SAVE_FILE]
optional arguments:
-h, --help show this help message and exit
--train Train a high-level policy with default settings.
Always saved in the root folder. Always tests after
training
--test Test a saved high-level policy. Uses backends/trained_
policies/highlevel/highlevel_weights.h5f by default
--evaluate Evaluate a saved high level policy over n trials. Uses
backends/trained_policies/highlevel/highlevel_weights.
h5f by default
--saved_policy_in_root
Use saved policies in the root of the project rather than
backends/trained_policies/highlevel/ (which is default)
--load_weights Load a saved policy from root folder first before training
--tensorboard Use tensorboard while training
--visualize Visualize the training. Testing is always visualized.
Evaluation is not visualized by default
--nb_steps NB_STEPS Number of steps to train for. Default is 25000
--nb_episodes_for_test NB_EPISODES_FOR_TEST
The number of episodes to test/evaluate. Default is 20
--nb_trials NB_TRIALS
The number of trials to evaluate. Default is 10
--save_file SAVE_FILE
filename to save/load the trained policy. Location is
as specified by --saved_policy_in_root
Training¶
Run high_level_policy_main.py --train
along with other supported arguments to train a policy using reinforcement learning default settings. By default, it is saved to the root folder so as not to overwrite already trained policies. For example:
python3 high_level_policy_training.py --train --nb_steps=25000 --nb_episodes_for_test=20
Testing¶
Run high_level_policy_main.py --test
or high_level_policy_main.py --evaluate
along with other supported arguments to test the trained policy. By default, uses the trained policies in /backends/trained_policies/highlevel. For example:
python3 high_level_policy_training.py --evaluate --nb_trials=5 --nb_episodes_for_test=20
Monte-Carlo Tree Search¶
Even after using such a hierarchical structure and learn-time verification, collisions and constraint violations can be inevitable because the policies are never perfect. This framework supports the safe execution of the high-level policy by using an MCTS to look ahead in time and choose paths that do not lead to a collision or a temporal logic violation. mcts.py is used to execute the learned policies using MCTS.
usage: mcts.py [-h] [--evaluate] [--test]
[--nb_episodes_for_test NB_EPISODES_FOR_TEST] [--visualize]
[--depth DEPTH] [--nb_traversals NB_TRAVERSALS]
[--nb_episodes NB_EPISODES] [--nb_trials NB_TRIALS] [--debug]
[--highlevel_policy_in_root]
optional arguments:
-h, --help show this help message and exit
--evaluate Evaluate over n trials, no visualization by default.
--test Tests MCTS for 100 episodes by default.
--nb_episodes_for_test NB_EPISODES_FOR_TEST
Number of episodes to test/evaluate. Default is 100
--visualize Visualize the training.
--depth DEPTH Max depth of tree per episode. Default is 5
--nb_traversals NB_TRAVERSALS
Number of traversals to perform per episode. Default
is 50
--nb_episodes NB_EPISODES
Number of episodes per trial to evaluate. Default is
100
--nb_trials NB_TRIALS
Number of trials to evaluate. Default is 10
--debug Show debug output. Default is false
--highlevel_policy_in_root
Use saved high-level policy in root of project rather
than backends/trained_policies/highlevel/
Verifier¶
The low-level and high-level policies while being trained, and MCTS during execution, are verified using user-defined temporal logic constraints. The user can define global LTL properties that apply to all maneuvers as well as maneuver-specific LTL constraints. They can also choose to provide a negative or positive reward feedback to the agent on violating a constaint to nudge the agent in the right direction.
The global constraints are also checked during run-time of the trained agent. An example of a global constraint would be traffic rule that ensures the vehicle always stops at the stop sign.
Atomic Propositions¶
Atomic propositions (AP) for the simple_intersection environment are defined as human-readable strings in verifier/simple_intersection/AP_dict.py. These should evaluate to True or False depending on the state of the environment and they need to be updated in every step of the environment. The temporal logic properties are constructed using a combination of atomic propositions and logic operators. For example, an AP over_speed_limit is set to True if the vehicle is above the speed limit.
Linear Temporal Logic¶
Global temporal logic properties are defined in env/simple_intersection/simple_intersection_env.py in _init_LTL_preconditions() function. For example, to add a global constraint that speed must not go beyond speed-limit, you may add self._LTL_preconditions.append(LTLProperty("G( not over_speed_limit)", 200))
to give a penalty of -200 if the agent crosses the speed limit while using any maneuver.
Maneuver-specific LTL properties can be defined in the corresponding maneuver class in options/simple_intersection/manuevers.py in _init_LTL_preconditions() function.
Temporal logic properties are specified using the following syntactic forms, where literal strings are given within quotation marks.
Form | Meaning |
---|---|
“F” phi | Finally, eventually |
“G” phi | Globally, always |
“X” phi | neXt state |
phi “U” phi | Until |
phi “=>” phi | logical implication |
phi “or” phi | logical or |
phi “and” phi | logical and |
“not” phi | logical negation |
“(” phi “)” | parenthetic grouping |
atomic | atomic_proposition_string |
Operator precedence (evaluation order), highest first, is: {atomic, “(“…”)”} > “not” > “and” > “or” > “=>” > {“F”, “G”, “X”, “U”}
Nested temporal operators must be enclosed in parentheses, e.g. G in_stop_region => (in_stop_region U has_stopped_in_stop_region).
Note that the arguments of “U” should be predicates over atomic propositions.
Learning Backend¶
The framework also supports multiple learning backends and it is easy to add and use other ones as necessary. The KerasRL reinforcement learning framework was used to learn and test policies for the simple-intersection environment using backends/kerasrl_learner.py. The stable-baselines library has also been incorporated in backends/baselines_learner.py.
Additional Options¶
For investigation, we also provide additional learnable option ‘halt’, with its pretrained parameters, ‘manualwait’ not trainable but manually crafted, etc. To use those options, specify those options in config.json along with the other options to be used and follow the instructions above to get the experimental results. This feature makes it able to investigate more of WiseMove autonomous driving framework, with a variety of choices and combinations of the options.