Between Tuesday August 20th, 5:00pm and Thursday August 22nd, 8:00am git.uwaterloo.ca will be down for an upgrade to version 10.8.7.

Name Last Update
backends Loading commit data...
documentation Loading commit data...
env Loading commit data...
options Loading commit data...
scripts Loading commit data...
verifier Loading commit data...
.gitignore Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
config.json Loading commit data...
high_level_policy_main.py Loading commit data...
low_level_policy_main.py Loading commit data...
mcts.py Loading commit data...
mcts_config.json Loading commit data...
ppo2_training.py Loading commit data...
requirements.txt Loading commit data...

WISEMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and safety verification using temporal logic constraints.

High-level Policy Low-level Training (Before and After)

Requirements

  • Python 3.6
  • Sphinx
  • Please check requirements.txt for python package list.

Installation

  • Run the install dependencies script: ./scripts/install_dependencies.sh to install pip3 and required python packages.

Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, else will install required packages from the internet.

If you do not have an internet connection and the dependencies folder does not exist, you will need to run ./scripts/download_dependencies.sh using a machine with an internet connection first and transfer that folder.

Documentation

  • Open ./documentation/index.html to view the documentation
  • If the file does not exist, use command ./scripts/generate_doc.sh build to generate documentation first. Note that this requires Sphinx to be installed.

Replicate Results

Given below are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.

  • Open terminal and navigate to the root of the project directory.
  • Low-level policies:

    • Use python3 low_level_policy_main.py --help to see all available commands.
    • You can choose to test the provided pre-trained options:
      • To visually inspect all pre-trained options: python3 low_level_policy_main.py --test
      • To evaluate all pre-trained options: python3 low_level_policy_main.py --evaluate
      • To visually inspect a specific pre-trained policy: python3 low_level_policy_main.py --option=wait --test.
      • To evaluate a specific pre-trained policy: python3 low_level_policy_main.py --option=wait --evaluate.
      • Available options are: wait, changelane, stop, keeplane, follow
    • Or, you can train and test all the options. But this may take some time. Newly trained policies are saved to the root folder by default.
      • To train all low-level policies from scratch (~40 minutes): python3 low_level_policy_main.py --train.
      • To visually inspect all these new low-level policies: python3 low_level_policy_main.py --test --saved_policy_in_root.
      • To evaluate all these new low-level policies: python3 low_level_policy_main.py --evaluate --saved_policy_in_root.
      • Make sure the training is fully complete before running above test/evaluation.
    • It is faster to verify the training of a few options using below commands (Recommended):
      • To train a single low-level, for example, changelane (~6 minutes): python3 low_level_policy_main.py --option=changelane --train. This is saved to the root folder.
      • To evaluate one of these new low-level policies, for example changelane: python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root
      • Available options are: wait, changelane, stop, keeplane, follow
    • To replicate the experiments without additional properties:
      • Note that we have not provided a pre-trained policy that is trained without additional LTL.
      • You will need to train it by adding the argument --without_additional_ltl_properties to the above training procedures. For example, python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties
      • Now, use --evaluate to evaluate this new policy: python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root
    • The results of --evaluate here is one trial. In the experiments reported in the paper, we conduct multiple such trials.
  • High-level policy:

    • Use python3 high_level_policy_main.py --help to see all available commands.
    • You can use the provided pre-trained high-level policy:
      • To visually inspect this policy: python3 high_level_policy_main.py --test
      • To replicate the experiment used for reported results (~5 minutes): python3 high_level_policy_main.py --evaluate
    • Or, you can train the high-level policy from scratch (Note that this takes some time):
      • To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): python3 high_level_policy_main.py --train
      • To visually inspect this new policy: python3 high_level_policy_main.py --test --saved_policy_in_root
      • To replicate the experiment used for reported results (~5 minutes): python3 high_level_policy_main.py --evaluate --saved_policy_in_root.
    • Since above training takes a long time, you can instead verify using a lower number of steps:
      • To train for 0.1 million steps (~25 minutes): python3 high_level_policy_main.py --train --nb_steps=100000
      • Note that this has a much lower success rate of ~75%. So using this for MCTS will not give reported results.
    • The success average and standard deviation in evaluation corresponds to the result from high-level policy experiments.
  • MCTS:

    • Use python3 mcts.py --help to see all available commands.
    • You can run MCTS on the provided pre-trained high-level policy:
      • To visually inspect MCTS on the pre-trained policy: python3 mcts.py --test --nb_episodes=10
      • To replicate the experiment used for reported results: python3 mcts.py --evaluate. Note that this takes a very long time (~16 hours).
      • For a shorter version of the experiment: python3 mcts.py --evaluate --nb_trials=2 --nb_episodes=10 (~20 minutes)
    • Or, if you have trained a high-level policy from scratch, you can run MCTS on it:
      • To visually inspect MCTS on the new policy: python3 mcts.py --test --highlevel_policy_in_root --nb_episodes=10
      • To replicate the experiment used for reported results: python3 mcts.py --evaluate --highlevel_policy_in_root. Note that this takes a very long time (~16 hours).
      • For a shorter version of the experiment: python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10 (~20 minutes)
    • You can use the arguments --depth and --nb_traversals to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50).
    • The success average and standard deviation in the evaluation corresponds to the results from MCTS experiments.

The time taken to execute above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs:

Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
16GB memory
Nvidia GeForce GTX 1080 Ti
Ubuntu 16.04

Coding Standards

We follow PEP8 style guidelines for coding and PEP257 for documentation. It is not necessary to keep these in mind while coding, but before submitting a pull request, do these two steps for each python file you have modified.

  1. yapf -i YOUR_MODIFIED_FILE.py
  2. docformatter --in-place YOUR_MODIFIED_FILE.py

yapf formats the code and docformatter formats the docstrings.