wise-lab
wise-move

Repository



WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and model-checking using temporal logic constraints.

Requirements

Python 3.6
Sphinx
Please check requirements.txt for python package list.


Installation

Run the install dependencies script: ./scripts/install_dependencies.sh to install pip3 and required python packages.

Note: The script checks if dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder,
else will install required packages from the internet. If you do not have an internet connection and the dependencies folder does not exist,
you will need to run ./scripts/download_dependencies.sh using a machine with an internet connection first and transfer that folder.

Documentation

Open ./documentation/index.html to view the documentation
If the file does not exist, use command ./scripts/generate_doc.sh build to generate documentation first. Note that this requires Sphinx to be installed.


Replicate Results
These are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.

Run ./scripts/install_dependencies.sh to install python dependencies.
Low-level policies:

You can choose to train and test all the maneuvers. But this may take some time and is not recommended.

To train all low-level policies from scratch: python3 low_level_policy_main.py --train. This may take some time.
To test all these trained low-level policies: python3 low_level_policy_main.py --test --saved_policy_in_root.
Make sure the training is fully complete before running above test.


It is easier to verify few of the maneuvers using below commands:

To train a single low-level, for example wait: python3 low_level_policy_main.py --option=wait --train.
To test one of these trained low-level policies, for example wait: python3 low_level_policy_main.py --option=wait --test --saved_policy_in_root

Available maneuvers are: wait, changelane, stop, keeplane, follow


These results are visually evaluated.
Note: This training has a high variance issue due to the continuous action space, especially for stop and keeplane maneuvers. It may help to train for 0.2 million steps than the default 0.1 million by adding argument '--nb_steps=200000' while training.


High-level policy:

To train high-level policy from scratch using the given low-level policies: python3 high_level_policy_main.py --train

To evaluate this trained high-level policy: python3 high_level_policy_main.py --evaluate --saved_policy_in_root.
The success average and standard deviation corresponds to the result from high-level policy experiments.


To run MCTS using the high-level policy:

To obtain a probabilites tree and save it: python3 mcts.py --train

To evaluate using this saved tree: python3 mcts.py --evaluate --saved_policy_in_root.
The success average and standard deviation corresponds to the results from MCTS experiments.


Coding Standards
We follow PEP8 style guidelines for coding and PEP257 for documentation.
It is not necessary to keep these in mind while coding, but before
submitting a pull request, do these two steps for each python file you
have modified.

yapf -i YOUR_MODIFIED_FILE.py
docformatter --in-place YOUR_MODIFIED_FILE.py

yapf formats the code and docformatter formats the docstrings.