wise-lab
wise-move

Repository



WiseMove
WiseMove is a hierarchical framework to investigate safe reinforcement learning, using incremental "learntime" verification of temporal logic constraints.


Requirements

Python 3.6.8
Sphinx
Please check requirements.txt for python package list.


Installation

Run the install dependencies script: ./scripts/install_dependencies.sh to install pip3 and the required python packages.

Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, otherwise it will install the required packages from the internet.
If you do not have an internet connection and the dependencies folder does not exist, you will need to run ./scripts/download_dependencies.sh using a machine with an internet connection first, then transfer that folder.

Documentation

Open ./documentation/index.html to view the documentation
If the file does not exist, use command ./scripts/generate_doc.sh build to generate the documentation first. Note that this requires Sphinx to be installed.


Replicate Results
Given below are the minimum steps required to replicate the results reported in
Lee, J., Balakrishnan, A., Gaurav, A., Czarnecki, K. and Sedwards, S.,

"WiseMove: a framework to investigate safe deep reinforcement learning for autonomous driving,"

In: Parker D., Wolf V. (eds) Quantitative Evaluation of Systems. QEST 2019. Lecture Notes in Computer Science, vol. 11785. Springer, Cham, Sep., 2019.
For a detailed user guide, it is recommended to view the documentation.


Open terminal and navigate to the root of the project directory.


Low-level policies:

Use python3 low_level_policy_main.py --help to see all available commands.
You can choose to test the provided pre-trained options:

To visually inspect all pre-trained options: python3 low_level_policy_main.py --test

To evaluate all pre-trained options: python3 low_level_policy_main.py --evaluate

To visually inspect a specific pre-trained policy: python3 low_level_policy_main.py --option=wait --test.
To evaluate a specific pre-trained policy: python3 low_level_policy_main.py --option=wait --evaluate.
Available options are: wait, changelane, stop, keeplane, follow


Or, you can train and test all the options, noting that this may take some time. Newly trained policies are saved to the root folder by default.

To train all low-level policies from scratch (~40 minutes): python3 low_level_policy_main.py --train.
To visually inspect all the new low-level policies: python3 low_level_policy_main.py --test --saved_policy_in_root.
To evaluate all the new low-level policies: python3 low_level_policy_main.py --evaluate --saved_policy_in_root.
Make sure the training is fully complete before running the above test/evaluation.


It is faster to verify the training of a few options using the commands below (Recommended):

To train a single low-level policy, e.g., changelane (~6 minutes): python3 low_level_policy_main.py --option=changelane --train. This is saved to the root folder.
To evaluate the new changelane: python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root

Available options are: wait, changelane, stop, keeplane, follow


To replicate the experiments without additional properties:

Note that we have not provided a pre-trained policy that is trained without additional LTL.
You will need to train it by adding the argument --without_additional_ltl_properties to the above training procedures. For example, python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties

Now, use --evaluate to evaluate this new policy: python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root


**The results of --evaluate here is one trial. ** In the experiments reported in the paper, we conduct multiple such trials.


High-level policy:

Use python3 high_level_policy_main.py --help to see all available commands.
You can use the provided pre-trained high-level policy:

To visually inspect this policy: python3 high_level_policy_main.py --test

To replicate the experiment used for reported results (~5 minutes): python3 high_level_policy_main.py --evaluate


Or, you can train the high-level policy from scratch. Note that this takes some time:

To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): python3 high_level_policy_main.py --train

To visually inspect this new policy: python3 high_level_policy_main.py --test --saved_policy_in_root

To replicate the experiment used for reported results (~5 minutes): python3 high_level_policy_main.py --evaluate --saved_policy_in_root.


Since above training takes a long time, you can instead verify using a lower number of steps:

To train for 0.1 million steps (~25 minutes): python3 high_level_policy_main.py --train --nb_steps=100000

Note that this has a much lower success rate of ~75%. Using this for MCTS will not reproduce the reported results.


The average success and standard deviation in the evaluation corresponds to the results of high-level policy experiments.


MCTS:

Use python3 mcts.py --help to see all available commands.
You can run MCTS on the provided pre-trained high-level policy:

To visually inspect MCTS on the pre-trained policy: python3 mcts.py --test --nb_episodes=10

To replicate the experiment used for reported results: python3 mcts.py --evaluate. Note that this takes a very long time (~16 hours).
For a shorter version of the experiment: python3 mcts.py --evaluate --nb_trials=2 --nb_episodes=10 (~20 minutes)


Or, if you have trained a high-level policy from scratch, you can run MCTS on it:

To visually inspect MCTS on the new policy: python3 mcts.py --test --highlevel_policy_in_root --nb_episodes=10

To replicate the experiment used for reported results: python3 mcts.py --evaluate --highlevel_policy_in_root. Note that this takes a very long time (~16 hours).
For a shorter version of the experiment: python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10 (~20 minutes)


You can use the arguments --depth and --nb_traversals to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50).
The average success and standard deviation in the evaluation corresponds to the results from MCTS experiments.


The time taken to execute the above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs:
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz

16GB memory

Nvidia GeForce GTX 1080 Ti

Ubuntu 16.04

Coding Standards
We follow PEP8 style guidelines for coding and PEP257 for documentation.
It is not necessary to keep these in mind while coding, but before
submitting a pull request, do these two steps for each python file you
have modified.

yapf -i YOUR_MODIFIED_FILE.py
docformatter --in-place YOUR_MODIFIED_FILE.py

yapf formats the code and docformatter formats the docstrings.