* Run the install dependencies script: `./scripts/install_dependencies.sh` to install pip3 and required python packages.
* Run the install dependencies script: `./scripts/install_dependencies.sh` to install pip3 and the required python packages.
Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, otherwise it will install the required packages from the internet.
Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, otherwise it will install the required packages from the internet.
If you do not have an internet connection and the dependencies folder does not exist, you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder.
If you do not have an internet connection and the dependencies folder does not exist, you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first, then transfer that folder.
Documentation
Documentation
-------------
-------------
* Open `./documentation/index.html` to view the documentation
* Open `./documentation/index.html` to view the documentation
* If the file does not exist, use command `./scripts/generate_doc.sh build` to generate documentation first. Note that this requires Sphinx to be installed.
* If the file does not exist, use command `./scripts/generate_doc.sh build` to generate the documentation first. Note that this requires Sphinx to be installed.
Replicate Results
Replicate Results
-----------------
-----------------
Given below are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
Given below are the minimum steps required to replicate the results for the simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
* Open terminal and navigate to the root of the project directory.
* Open terminal and navigate to the root of the project directory.
* Low-level policies:
* Low-level policies:
* Use `python3 low_level_policy_main.py --help` to see all available commands.
* Use `python3 low_level_policy_main.py --help` to see all available commands.
...
@@ -43,34 +43,34 @@ Given below are the minimum steps required to replicate the results for simple_i
...
@@ -43,34 +43,34 @@ Given below are the minimum steps required to replicate the results for simple_i
* To visually inspect a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --test`.
* To visually inspect a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --test`.
* To evaluate a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --evaluate`.
* To evaluate a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --evaluate`.
* Available options are: wait, changelane, stop, keeplane, follow
* Available options are: wait, changelane, stop, keeplane, follow
* Or, you can train and test all the options. But this may take some time. Newly trained policies are saved to the root folder by default.
* Or, you can train and test all the options, noting that this may take some time. Newly trained policies are saved to the root folder by default.
* To train all low-level policies from scratch (~40 minutes): `python3 low_level_policy_main.py --train`.
* To train all low-level policies from scratch (~40 minutes): `python3 low_level_policy_main.py --train`.
* To visually inspect all these new low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
* To visually inspect all the new low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
* To evaluate all these new low-level policies: `python3 low_level_policy_main.py --evaluate --saved_policy_in_root`.
* To evaluate all the new low-level policies: `python3 low_level_policy_main.py --evaluate --saved_policy_in_root`.
* Make sure the training is fully complete before running above test/evaluation.
* Make sure the training is fully complete before running the above test/evaluation.
* It is faster to verify the training of a few options using below commands (**Recommended**):
* It is faster to verify the training of a few options using the commands below (**Recommended**):
* To train a single low-level, for example, *changelane* (~6 minutes): `python3 low_level_policy_main.py --option=changelane --train`. This is saved to the root folder.
* To train a single low-level policy, e.g., *changelane* (~6 minutes): `python3 low_level_policy_main.py --option=changelane --train`. This is saved to the root folder.
* To evaluate one of these new low-level policies, for example*changelane*: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* To evaluate the new*changelane*: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* Available options are: wait, changelane, stop, keeplane, follow
* Available options are: wait, changelane, stop, keeplane, follow
***To replicate the experiments without additional properties:**
***To replicate the experiments without additional properties:**
* Note that we have not provided a pre-trained policy that is trained without additional LTL.
* Note that we have not provided a pre-trained policy that is trained without additional LTL.
* You will need to train it by adding the argument `--without_additional_ltl_properties` to the above *training* procedures. For example, `python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties`
* You will need to train it by adding the argument `--without_additional_ltl_properties` to the above *training* procedures. For example, `python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties`
* Now, use `--evaluate` to evaluate this new policy: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* Now, use `--evaluate` to evaluate this new policy: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
***The results of `--evaluate` here is one trial.** In the experiments reported in the paper, we conduct multiple such trials.
***The results of `--evaluate` here is one trial.** In the experiments reported in the paper, we conduct multiple such trials.
* High-level policy:
* High-level policy:
* Use `python3 high_level_policy_main.py --help` to see all available commands.
* Use `python3 high_level_policy_main.py --help` to see all available commands.
* You can use the provided pre-trained high-level policy:
* You can use the provided pre-trained high-level policy:
* To visually inspect this policy: `python3 high_level_policy_main.py --test`
* To visually inspect this policy: `python3 high_level_policy_main.py --test`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate`
* Or, you can train the high-level policy from scratch (Note that this takes some time):
* Or, you can train the high-level policy from scratch. Note that this takes some time:
* To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): `python3 high_level_policy_main.py --train`
* To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): `python3 high_level_policy_main.py --train`
* To visually inspect this new policy: `python3 high_level_policy_main.py --test --saved_policy_in_root`
* To visually inspect this new policy: `python3 high_level_policy_main.py --test --saved_policy_in_root`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
* Since above training takes a long time, you can instead verify using a lower number of steps:
* Since above training takes a long time, you can instead verify using a lower number of steps:
* To train for 0.1 million steps (~25 minutes): `python3 high_level_policy_main.py --train --nb_steps=100000`
* To train for 0.1 million steps (~25 minutes): `python3 high_level_policy_main.py --train --nb_steps=100000`
* Note that this has a much lower success rate of ~75%. So using this for MCTS will not give reported results.
* Note that this has a much lower success rate of ~75%. Using this for MCTS will not reproduce the reported results.
* The success average and standard deviation in evaluation corresponds to the result from high-level policy experiments.
* The average success and standard deviation in the evaluation corresponds to the results of high-level policy experiments.
* MCTS:
* MCTS:
* Use `python3 mcts.py --help` to see all available commands.
* Use `python3 mcts.py --help` to see all available commands.
* You can run MCTS on the provided pre-trained high-level policy:
* You can run MCTS on the provided pre-trained high-level policy:
...
@@ -82,10 +82,10 @@ Given below are the minimum steps required to replicate the results for simple_i
...
@@ -82,10 +82,10 @@ Given below are the minimum steps required to replicate the results for simple_i
* To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate --highlevel_policy_in_root`. Note that this takes a very long time (~16 hours).
* To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate --highlevel_policy_in_root`. Note that this takes a very long time (~16 hours).
* For a shorter version of the experiment: `python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10` (~20 minutes)
* For a shorter version of the experiment: `python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10` (~20 minutes)
* You can use the arguments `--depth` and `--nb_traversals` to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50).
* You can use the arguments `--depth` and `--nb_traversals` to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50).
* The success average and standard deviation in the evaluation corresponds to the results from MCTS experiments.
* The average success and standard deviation in the evaluation corresponds to the results from MCTS experiments.
The time taken to execute above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs:
The time taken to execute the above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs: