Quick Start

Environment

An environment that supports the use of options is already provided with the framework, called simple-intersection, which is an autonomous driving environment consisting of a single intersection with two-lane roads. The objective of the agent (the controllable vehicle) is to reach the other side of the intersection while obeying some defined traffic rules and without crashing. The agent takes rate of change of steering angle and acceleration as an input.

The environment supports the use of options/maneuvers, which model certain favourable behaviours. The number and type of maneuver are up to the user to define, but this framework comes with a few pre-defined maneuvers:

  • keeplane (drive on the current lane)
  • changelane (change to the other lane)
  • stop (come to a stop at a stop sign)
  • wait (wait at the stop sign until safe)
  • follow (follow the vehicle in front).

Low-level Policies

The policy for executing each maneuver is referred to as the low-level policy, which provides the agent with an input depending on the current state. They can be both manually-defined or learned. For the simple-intersection environment, low-level policies learned using reinforcement learning are provided in the framework. Learning is done using low_level_policy_main.py. Use low_level_policy_main.py --help to view supported arguments and defaults.

usage: low_level_policy_main.py [-h] [--train] [--option] [--test] [--evaluate]
                                [--saved_policy_in_root] [--load_weights]
                                [--tensorboard] [--visualize]
                                [--nb_steps NB_STEPS]
                                [--nb_episodes_for_test NB_EPISODES_FOR_TEST]

optional arguments:
  -h, --help            show this help message and exit
  --train               Train a low-level policy with default settings.
                        Always saved in the root folder. Always tests after
                        training
  --option              the option to train. Eg. stop, keeplane, wait,
                        changelane, follow. If not defined, trains all options
  --test                Test a saved low-level policy.
                        Uses saved policy in backends/trained_policies/OPTION_NAME/ by default
  --evaluate            Evaluate a saved low-level policy over 100 episodes.
                        Uses saved policy in backends/trained_policies/OPTION_NAME/ by default
  --saved_policy_in_root
                        Use saved policies in the root of the project rather than
                        backends/trained_policies/highlevel/
  --load_weights        Load a saved policy first before training
  --tensorboard         Use tensorboard while training
  --visualize           Visualize the training. Testing is always visualized.
  --nb_steps NB_STEPS   Number of steps to train for. Default is 100000
  --nb_episodes_for_test NB_EPISODES_FOR_TEST
                        The number of episodes to test. Default is 20

Training

Run low_level_policy_main.py --train --option=OPTION_NAME, where OPTION_NAME can be the key of any node defined in config.json to learn the option using reinforcement learning default settings and save the result to root folder. If no option is specified, all options are trained. The training can be customized further using other supported arguments. For example, to train keeplane maneuver and visualize the training, run:

python3 low_level_policy_main.py --train --option=keeplane --visualize

Testing

Run low_level_policy_main.py --test --option=OPTION_NAME along with other supported arguments to test the trained policy. By default, uses the trained policies in /backends/trained_policies. For example:

python3 low_level_policy_main.py --test --option=wait --nb_episodes_for_test=20

Evaluating

Run low_level_policy_main.py --evaluate --option=OPTION_NAME along with other supported arguments to evaluate the trained policy over 100 episodes. By default, uses the trained policies in /backends/trained_policies. For example:

python3 low_level_policy_main.py --evaluate --option=evaluate --saved_policy_in_root

High-level Policy

By composing smaller maneuvers together, the objective can be achieved. It is also possible to have just one maneuver, called ‘drive’ and train it to achieve the goal but it is much harder than defining smaller easy-to-achieve maneuvers and composing them together. This composing of the maneuvers is done by the high-level policy. It decides which maneuver to execute according to the state of the environment.

The high-level policy can also be manually-defined or learned. This framework comes with one learned with reinforcement learning for the simple-intersection environment. Learning is done using high_level_policy_main.py. Use high_level_policy_main.py --help to view supported arguments and defaults.

usage: high_level_policy_main.py [-h] [--train] [--test] [--evaluate]
                                 [--saved_policy_in_root] [--load_weights]
                                 [--tensorboard] [--visualize]
                                 [--nb_steps NB_STEPS]
                                 [--nb_episodes_for_test NB_EPISODES_FOR_TEST]
                                 [--nb_trials NB_TRIALS]
                                 [--save_file SAVE_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --train               Train a high-level policy with default settings.
                        Always saved in the root folder. Always tests after
                        training
  --test                Test a saved high-level policy. Uses backends/trained_
                        policies/highlevel/highlevel_weights.h5f by default
  --evaluate            Evaluate a saved high level policy over n trials. Uses
                        backends/trained_policies/highlevel/highlevel_weights.
                        h5f by default
  --saved_policy_in_root
                        Use saved policies in the root of the project rather than
                        backends/trained_policies/highlevel/ (which is default)
  --load_weights        Load a saved policy from root folder first before training
  --tensorboard         Use tensorboard while training
  --visualize           Visualize the training. Testing is always visualized.
                        Evaluation is not visualized by default
  --nb_steps NB_STEPS   Number of steps to train for. Default is 25000
  --nb_episodes_for_test NB_EPISODES_FOR_TEST
                        The number of episodes to test/evaluate. Default is 20
  --nb_trials NB_TRIALS
                        The number of trials to evaluate. Default is 10
  --save_file SAVE_FILE
                        filename to save/load the trained policy. Location is
                        as specified by --saved_policy_in_root

Training

Run high_level_policy_main.py --train along with other supported arguments to train a policy using reinforcement learning default settings. By default, it is saved to the root folder so as not to overwrite already trained policies. For example:

python3 high_level_policy_training.py --train --nb_steps=25000 --nb_episodes_for_test=20

Testing

Run high_level_policy_main.py --test or high_level_policy_main.py --evaluate along with other supported arguments to test the trained policy. By default, uses the trained policies in /backends/trained_policies/highlevel. For example:

python3 high_level_policy_training.py --evaluate --nb_trials=5 --nb_episodes_for_test=20

Verifier

The low-level and high-level policies while being trained, and MCTS during execution, are verified using user-defined temporal logic constraints. The user can define global LTL properties that apply to all maneuvers as well as maneuver-specific LTL constraints. They can also choose to provide a negative or positive reward feedback to the agent on violating a constaint to nudge the agent in the right direction.

The global constraints are also checked during run-time of the trained agent. An example of a global constraint would be traffic rule that ensures the vehicle always stops at the stop sign.

Atomic Propositions

Atomic propositions (AP) for the simple_intersection environment are defined as human-readable strings in verifier/simple_intersection/AP_dict.py. These should evaluate to True or False depending on the state of the environment and they need to be updated in every step of the environment. The temporal logic properties are constructed using a combination of atomic propositions and logic operators. For example, an AP over_speed_limit is set to True if the vehicle is above the speed limit.

Linear Temporal Logic

Global temporal logic properties are defined in env/simple_intersection/simple_intersection_env.py in _init_LTL_preconditions() function. For example, to add a global constraint that speed must not go beyond speed-limit, you may add self._LTL_preconditions.append(LTLProperty("G( not over_speed_limit)", 200)) to give a penalty of -200 if the agent crosses the speed limit while using any maneuver.

Maneuver-specific LTL properties can be defined in the corresponding maneuver class in options/simple_intersection/manuevers.py in _init_LTL_preconditions() function.

Temporal logic properties are specified using the following syntactic forms, where literal strings are given within quotation marks.

Form Meaning
“F” phi Finally, eventually
“G” phi Globally, always
“X” phi neXt state
phi “U” phi Until
phi “=>” phi logical implication
phi “or” phi logical or
phi “and” phi logical and
“not” phi logical negation
“(” phi “)” parenthetic grouping
atomic atomic_proposition_string

Operator precedence (evaluation order), highest first, is: {atomic, “(“…”)”} > “not” > “and” > “or” > “=>” > {“F”, “G”, “X”, “U”}

Nested temporal operators must be enclosed in parentheses, e.g. G in_stop_region => (in_stop_region U has_stopped_in_stop_region).

Note that the arguments of “U” should be predicates over atomic propositions.

Learning Backend

The framework also supports multiple learning backends and it is easy to add and use other ones as necessary. The KerasRL reinforcement learning framework was used to learn and test policies for the simple-intersection environment using backends/kerasrl_learner.py. The stable-baselines library has also been incorporated in backends/baselines_learner.py.

Additional Options

For investigation, we also provide additional learnable option ‘halt’, with its pretrained parameters, ‘manualwait’ not trainable but manually crafted, etc. To use those options, specify those options in config.json along with the other options to be used and follow the instructions above to get the experimental results. This feature makes it able to investigate more of WiseMove autonomous driving framework, with a variety of choices and combinations of the options.