3.57 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and model-checking using temporal logic constraints.


* Python 3.6
* Sphinx
* Please check `requirements.txt` for python package list.


* Run the install dependencies script: `./scripts/` to install pip3 and required python packages.

Note: The script checks if dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, 
else will install required packages from the internet. If you do not have an internet connection and the dependencies folder does not exist, 
you will need to run `./scripts/` using a machine with an internet connection first and transfer that folder.


* Open `./documentation/index.html` to view the documentation
* If the file does not exist, use command `./scripts/ build` to generate documentation first. Note that this requires Sphinx to be installed.

Replicate Results
These are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.

* Run `./scripts/` to install python dependencies.
* Low-level policies:
Unknown's avatar
Unknown committed
32 33 34 35 36 37 38 39 40
    * You can choose to train and test all the maneuvers. But this may take some time and is not recommended.
        * To train all low-level policies from scratch: `python3 --train`. This may take some time.
        * To test all these trained low-level policies: `python3 --test --saved_policy_in_root`.
        * Make sure the training is fully complete before running above test.
    * It is easier to verify few of the maneuvers using below commands:
        * To train a single low-level, for example wait: `python3 --option=wait --train`.
        * To test one of these trained low-level policies, for example wait: `python3 --option=wait --test --saved_policy_in_root`
        * Available maneuvers are: wait, changelane, stop, keeplane, follow
    * These results are visually evaluated.
Jae Young Lee's avatar
Jae Young Lee committed
    * Note: This training has a high variance issue due to the continuous action space, especially for stop and keeplane maneuvers. It may help to train for 0.2 million steps than the default 0.1 million by adding argument '--nb_steps=200000' while training.
42 43
* High-level policy:
    * To train high-level policy from scratch using the given low-level policies: `python3 --train`
Unknown's avatar
Unknown committed
44 45 46 47 48 49
    * To evaluate this trained high-level policy: `python3 --evaluate --saved_policy_in_root`.
    * The success average and standard deviation corresponds to the result from high-level policy experiments.
* To run MCTS using the high-level policy:
    * To obtain a probabilites tree and save it: `python3 --train`
    * To evaluate using this saved tree: `python3 --evaluate --saved_policy_in_root`.
    * The success average and standard deviation corresponds to the results from MCTS experiments.
50 51 52 53 54 55 56 57 58 59 60 61

Coding Standards

We follow PEP8 style guidelines for coding and PEP257 for documentation.
It is not necessary to keep these in mind while coding, but before
submitting a pull request, do these two steps for each python file you 
have modified.

1. `yapf -i`
2. `docformatter --in-place`

Jae Young Lee's avatar
Jae Young Lee committed
`yapf` formats the code and `docformatter` formats the docstrings.