Commit 9c5bcb29 by Sean Sedwards

Merge branch 'gif_readme' into 'master'

Gif readme See merge request !12
parents 6a7cd92a 09777068
WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and safety verification using temporal logic constraints.
# WiseMove
WiseMove is a safe reinforcement learning framework that combines hierarchical reinforcement learning with safety verification using temporal logic constraints.
<br/>
<div align="center">
<img src="documentation/figures/highlevel.gif"/>
</div>
<br/>
Requirements
============
------------
* Python 3.6
* Sphinx
......@@ -9,23 +17,23 @@ Requirements
Installation
============
------------
* Run the install dependencies script: `./scripts/install_dependencies.sh` to install pip3 and required python packages.
* Run the install dependencies script: `./scripts/install_dependencies.sh` to install pip3 and the required python packages.
Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, else will install required packages from the internet.
Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, otherwise it will install the required packages from the internet.
If you do not have an internet connection and the dependencies folder does not exist, you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder.
If you do not have an internet connection and the dependencies folder does not exist, you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first, then transfer that folder.
Documentation
=============
-------------
* Open `./documentation/index.html` to view the documentation
* If the file does not exist, use command `./scripts/generate_doc.sh build` to generate documentation first. Note that this requires Sphinx to be installed.
* If the file does not exist, use command `./scripts/generate_doc.sh build` to generate the documentation first. Note that this requires Sphinx to be installed.
Replicate Results
=================
Given below are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
-----------------
Given below are the minimum steps required to replicate the results for the simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
* Open terminal and navigate to the root of the project directory.
* Low-level policies:
* Use `python3 low_level_policy_main.py --help` to see all available commands.
......@@ -35,34 +43,34 @@ Given below are the minimum steps required to replicate the results for simple_i
* To visually inspect a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --test`.
* To evaluate a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --evaluate`.
* Available options are: wait, changelane, stop, keeplane, follow
* Or, you can train and test all the options. But this may take some time. Newly trained policies are saved to the root folder by default.
* Or, you can train and test all the options, noting that this may take some time. Newly trained policies are saved to the root folder by default.
* To train all low-level policies from scratch (~40 minutes): `python3 low_level_policy_main.py --train`.
* To visually inspect all these new low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
* To evaluate all these new low-level policies: `python3 low_level_policy_main.py --evaluate --saved_policy_in_root`.
* Make sure the training is fully complete before running above test/evaluation.
* It is faster to verify the training of a few options using below commands (**Recommended**):
* To train a single low-level, for example, *changelane* (~6 minutes): `python3 low_level_policy_main.py --option=changelane --train`. This is saved to the root folder.
* To evaluate one of these new low-level policies, for example *changelane*: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* To visually inspect all the new low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
* To evaluate all the new low-level policies: `python3 low_level_policy_main.py --evaluate --saved_policy_in_root`.
* Make sure the training is fully complete before running the above test/evaluation.
* It is faster to verify the training of a few options using the commands below (**Recommended**):
* To train a single low-level policy, e.g., *changelane* (~6 minutes): `python3 low_level_policy_main.py --option=changelane --train`. This is saved to the root folder.
* To evaluate the new *changelane*: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* Available options are: wait, changelane, stop, keeplane, follow
* **To replicate the experiments without additional properties:**
* Note that we have not provided a pre-trained policy that is trained without additional LTL.
* You will need to train it by adding the argument `--without_additional_ltl_properties` to the above *training* procedures. For example, `python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties`
* Now, use `--evaluate` to evaluate this new policy: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* **The results of `--evaluate` here is one trial.** In the experiments reported in the paper, we conduct multiple such trials.
* **The results of `--evaluate` here is one trial. ** In the experiments reported in the paper, we conduct multiple such trials.
* High-level policy:
* Use `python3 high_level_policy_main.py --help` to see all available commands.
* You can use the provided pre-trained high-level policy:
* To visually inspect this policy: `python3 high_level_policy_main.py --test`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate`
* Or, you can train the high-level policy from scratch (Note that this takes some time):
* Or, you can train the high-level policy from scratch. Note that this takes some time:
* To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): `python3 high_level_policy_main.py --train`
* To visually inspect this new policy: `python3 high_level_policy_main.py --test --saved_policy_in_root`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
* Since above training takes a long time, you can instead verify using a lower number of steps:
* To train for 0.1 million steps (~25 minutes): `python3 high_level_policy_main.py --train --nb_steps=100000`
* Note that this has a much lower success rate of ~75%. So using this for MCTS will not give reported results.
* The success average and standard deviation in evaluation corresponds to the result from high-level policy experiments.
* Note that this has a much lower success rate of ~75%. Using this for MCTS will not reproduce the reported results.
* The average success and standard deviation in the evaluation corresponds to the results of high-level policy experiments.
* MCTS:
* Use `python3 mcts.py --help` to see all available commands.
* You can run MCTS on the provided pre-trained high-level policy:
......@@ -74,10 +82,10 @@ Given below are the minimum steps required to replicate the results for simple_i
* To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate --highlevel_policy_in_root`. Note that this takes a very long time (~16 hours).
* For a shorter version of the experiment: `python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10` (~20 minutes)
* You can use the arguments `--depth` and `--nb_traversals` to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50).
* The success average and standard deviation in the evaluation corresponds to the results from MCTS experiments.
* The average success and standard deviation in the evaluation corresponds to the results from MCTS experiments.
The time taken to execute above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs:
The time taken to execute the above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs:
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
16GB memory
......@@ -86,7 +94,7 @@ Ubuntu 16.04
Coding Standards
================
----------------
We follow PEP8 style guidelines for coding and PEP257 for documentation.
It is not necessary to keep these in mind while coding, but before
......
......@@ -22,7 +22,10 @@ class VehicleNetworkCross(Shape):
self.env = env
self.vehs_sprites = []
for i, veh in enumerate(self.env.vehs):
veh_pic = Image('car_agent.png', image_type="car", tile_scale=True)
if i == EGO_INDEX:
veh_pic = Image('ego_vehicle.png', image_type="car", tile_scale=True)
else:
veh_pic = Image('car_agent.png', image_type="car", tile_scale=True)
self.vehs_sprites.append(veh_pic)
def draw(self, ego_info_text):
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment