diff --git a/README.md b/README.md
index 8afb6f2360eb26016fda4170a535895557272014..e6a35a92c26d7233eb71a9cd7c68971394553af2 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and model-checking using temporal logic constraints.
+WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and safety verification using temporal logic constraints.
Requirements
============
@@ -13,9 +13,9 @@ Installation
* Run the install dependencies script: `./scripts/install_dependencies.sh` to install pip3 and required python packages.
-Note: The script checks if dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder,
-else will install required packages from the internet. If you do not have an internet connection and the dependencies folder does not exist,
-you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder.
+Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, else will install required packages from the internet.
+
+If you do not have an internet connection and the dependencies folder does not exist, you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder.
Documentation
=============
@@ -25,28 +25,65 @@ Documentation
Replicate Results
=================
-These are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
-
-* Run `./scripts/install_dependencies.sh` to install python dependencies.
+Given below are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
+* Open terminal and navigate to the root of the project directory.
* Low-level policies:
- * You can choose to train and test all the maneuvers. But this may take some time and is not recommended.
- * To train all low-level policies from scratch: `python3 low_level_policy_main.py --train`. This may take some time.
- * To test all these trained low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
- * Make sure the training is fully complete before running above test.
- * It is easier to verify few of the maneuvers using below commands:
- * To train a single low-level, for example wait: `python3 low_level_policy_main.py --option=wait --train`.
- * To test one of these trained low-level policies, for example wait: `python3 low_level_policy_main.py --option=wait --test --saved_policy_in_root`
- * Available maneuvers are: wait, changelane, stop, keeplane, follow
- * These results are visually evaluated.
- * Note: This training has a high variance issue due to the continuous action space, especially for stop and keeplane maneuvers. It may help to train for 0.2 million steps than the default 0.1 million by adding argument '--nb_steps=200000' while training.
+ * Use `python3 low_level_policy_main.py --help` to see all available commands.
+ * You can choose to test the provided pre-trained options:
+ * To visually inspect all pre-trained options: `python3 low_level_policy_main.py --test`
+ * To evaluate all pre-trained options: `python3 low_level_policy_main.py --evaluate`
+ * To visually inspect a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --test`.
+ * To evaluate a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --evaluate`.
+ * Available options are: wait, changelane, stop, keeplane, follow
+ * Or, you can train and test all the options. But this may take some time. Newly trained policies are saved to the root folder by default.
+ * To train all low-level policies from scratch (~40 minutes): `python3 low_level_policy_main.py --train`.
+ * To visually inspect all these new low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
+ * To evaluate all these new low-level policies: `python3 low_level_policy_main.py --evaluate --saved_policy_in_root`.
+ * Make sure the training is fully complete before running above test/evaluation.
+ * It is faster to verify the training of a few options using below commands (**Recommended**):
+ * To train a single low-level, for example, *changelane* (~6 minutes): `python3 low_level_policy_main.py --option=changelane --train`. This is saved to the root folder.
+ * To evaluate one of these new low-level policies, for example *changelane*: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
+ * Available options are: wait, changelane, stop, keeplane, follow
+ * **To replicate the experiments without additional properties:**
+ * Note that we have not provided a pre-trained policy that is trained without additional LTL.
+ * You will need to train it by adding the argument `--without_additional_ltl_properties` to the above *training* procedures. For example, `python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties`
+ * Now, use `--evaluate` to evaluate this new policy: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
+ * **The results of `--evaluate` here is one trial.** In the experiments reported in the paper, we conduct multiple such trials.
+
* High-level policy:
- * To train high-level policy from scratch using the given low-level policies: `python3 high_level_policy_main.py --train`
- * To evaluate this trained high-level policy: `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
- * The success average and standard deviation corresponds to the result from high-level policy experiments.
-* To run MCTS using the high-level policy:
- * To obtain a probabilites tree and save it: `python3 mcts.py --train`
- * To evaluate using this saved tree: `python3 mcts.py --evaluate --saved_policy_in_root`.
- * The success average and standard deviation corresponds to the results from MCTS experiments.
+ * Use `python3 high_level_policy_main.py --help` to see all available commands.
+ * You can use the provided pre-trained high-level policy:
+ * To visually inspect this policy: `python3 high_level_policy_main.py --test`
+ * To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate`
+ * Or, you can train the high-level policy from scratch (Note that this takes some time):
+ * To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): `python3 high_level_policy_main.py --train`
+ * To visually inspect this new policy: `python3 high_level_policy_main.py --test --saved_policy_in_root`
+ * To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
+ * Since above training takes a long time, you can instead verify using a lower number of steps:
+ * To train for 0.1 million steps (~25 minutes): `python3 high_level_policy_main.py --train --nb_steps=100000`
+ * Note that this has a much lower success rate of ~75%. So using this for MCTS will not give reported results.
+ * The success average and standard deviation in evaluation corresponds to the result from high-level policy experiments.
+* MCTS:
+ * Use `python3 mcts.py --help` to see all available commands.
+ * You can run MCTS on the provided pre-trained high-level policy:
+ * To visually inspect MCTS on the pre-trained policy: `python3 mcts.py --test --nb_episodes=10`
+ * To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate`. Note that this takes a very long time (~16 hours).
+ * For a shorter version of the experiment: `python3 mcts.py --evaluate --nb_trials=2 --nb_episodes=10` (~20 minutes)
+ * Or, if you have trained a high-level policy from scratch, you can run MCTS on it:
+ * To visually inspect MCTS on the new policy: `python3 mcts.py --test --highlevel_policy_in_root --nb_episodes=10`
+ * To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate --highlevel_policy_in_root`. Note that this takes a very long time (~16 hours).
+ * For a shorter version of the experiment: `python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10` (~20 minutes)
+ * You can use the arguments `--depth` and `--nb_traversals` to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50).
+ * The success average and standard deviation in the evaluation corresponds to the results from MCTS experiments.
+
+
+The time taken to execute above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs:
+
+Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
+16GB memory
+Nvidia GeForce GTX 1080 Ti
+Ubuntu 16.04
+
Coding Standards
================
diff --git a/backends/kerasrl_learner.py b/backends/kerasrl_learner.py
index a341711f4f6dc571d8386ded1907d8d464f733d3..bd1bb82bf9f8ca8acd2311b4581bfcccafad6b45 100644
--- a/backends/kerasrl_learner.py
+++ b/backends/kerasrl_learner.py
@@ -206,11 +206,13 @@ class DDPGLearner(LearnerBase):
def test_model(self,
env,
nb_episodes=50,
+ callbacks=None,
visualize=True,
nb_max_episode_steps=200):
self.agent_model.test(
env,
nb_episodes=nb_episodes,
+ callbacks=callbacks,
visualize=visualize,
nb_max_episode_steps=nb_max_episode_steps)
diff --git a/documentation/sphinx/.build/.doc/backends.html b/documentation/sphinx/.build/.doc/backends.html
index ffc37fa577b3379e6e713902af2fc773ec15cd2b..65ada1e1e22e1fb2d0f01d8eea101dd08aa9af1b 100644
--- a/documentation/sphinx/.build/.doc/backends.html
+++ b/documentation/sphinx/.build/.doc/backends.html
@@ -46,15 +46,8 @@
Use the option_alias from the root node and reposition the tree
+such that the new root node is the node reached by following option_alias
+from the current root node.
A property from the base class which is True if the goal
-of the road scenario is achieved, otherwise False. This property is
-used in both step of EpisodicEnvBase and the implementation of
-the high-level reinforcement learning and execution.
+
A property from the base class which is True if the goal of the road
+scenario is achieved, otherwise False.
+
This property is used in both step of EpisodicEnvBase and the
+implementation of the high-level reinforcement learning and
+execution.
@@ -629,6 +623,11 @@ required to be initialzied.
the number of the other vehicles, initialized in generate_scenario
Normalizes each element in a tuple according to ranges defined in self.cost_normalization_ranges.
-Normalizes between 0 and 1. And the scales by scale_factor
+
Normalizes each element in a tuple according to ranges defined in
+self.cost_normalization_ranges. Normalizes between 0 and 1. And the
+scales by scale_factor.
Do RL of the high-level policy and test it.
-:param nb_steps: the number of steps to perform RL
-:param load_weights: True if the pre-learned NN weights are loaded (for initializations of NNs)
-:param training: True to enable training
-:param testing: True to enable testing
-:param nb_episodes_for_test: the number of episodes for testing
Do RL of the low-level policy of the given maneuver and test it.
-:param maneuver: the name of the maneuver defined in config.json (e.g., ‘default’).
-:param nb_steps: the number of steps to perform RL.
-:param RL_method: either DDPG or PPO2.
-:param load_weights: True if the pre-learned NN weights are loaded (for initializations of NNs).
-:param training: True to enable training.
-:param testing: True to enable testing.
-:param visualize: True to see the graphical outputs during training.
-:param nb_episodes_for_test: the number of episodes for testing.
Do RL of the low-level policy of the given maneuver and test it.
-:param nb_traversals: number of MCTS traversals
-:param save_every: save at every these many traversals
-:param visualize: visualization / rendering
Do RL of the low-level policy of the given maneuver and test it.
-:param nb_traversals: number of MCTS traversals
-:param save_every: save at every these many traversals
-:param visualize: visualization / rendering
This is a base class that contains information of an LTL property.
-
It encapsulates the model-checking part (see check / check_incremental),
-and contains additional information. The subclass needs to describe
-specific APdict to be used.
Checks a new state w.r.t. an existing property Scanner.
-Constructs a trace from new states using a new or previous trace list.
-If a previous trace list is used, states after index self.step are not valid.
generates the scenario for low-level policy learning and validation. This method
-
will be used in generate_learning_scenario and generate_validation_scenario in
-the subclasses.
+
generates the scenario for low-level policy learning and validation.
+This method will be used in generate_learning_scenario and
+generate_validation_scenario in the subclasses.
+
Param:
enable_LTL_preconditions: whether to enable LTL preconditions in the maneuver or not
timeout: the timeout for the scenario (which is infinity by default)
@@ -88,8 +88,16 @@ timeout: the timeout for the scenario (which is infinity by default)
the low level policy as a map from a feature vector to an action
-(a, dot_psi). By default, it’ll call low_level_manual_policy below
-if it’s implemented in the subclass.
+
the low level policy as a map from a feature vector to an action (a,
+dot_psi).
+
By default, it’ll call low_level_manual_policy below if it’s
+implemented in the subclass.
@@ -140,8 +149,8 @@ if it’s implemented in the subclass.
a virtual function (property) from ManeuverBase.
+As KeepLane is a default maneuver, it has to be activated to be
+chosen at any time, state, and condition (refer to
+initiation_condition of ManeuverBase for the usual case).
+:returns True.
This is a base class that contains information of an LTL property.
+
It encapsulates the model-checking part (see check /
+check_incremental), and contains additional information. The
+subclass needs to describe specific APdict to be used.
an existing property Scanner. Constructs a trace from new states
+using a new or previous trace list. If a previous trace list is
+used, states after index self.step are not valid.
+
+
+
+
\ No newline at end of file
diff --git a/documentation/sphinx/.build/.doctrees/.doc/backends.doctree b/documentation/sphinx/.build/.doctrees/.doc/backends.doctree
index 54ed373191d590a651e90ca9e5cefc70103a7c6f..4f419c4894cc8cac5b79670f3508ece045b95852 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/backends.doctree and b/documentation/sphinx/.build/.doctrees/.doc/backends.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/env.doctree b/documentation/sphinx/.build/.doctrees/.doc/env.doctree
index 1c932b51458ba40de60f64a1af78953cb6af8ab9..d5850fdcf5f1ae2fb9f17a911cb47e97d300ba47 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/env.doctree and b/documentation/sphinx/.build/.doctrees/.doc/env.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/env.simple_intersection.doctree b/documentation/sphinx/.build/.doctrees/.doc/env.simple_intersection.doctree
index 2d6b56f2f075019de9fc9606a07abbb87b7c1b1c..86b0310f0112bb2f05488c6a06ca81e0d8227c13 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/env.simple_intersection.doctree and b/documentation/sphinx/.build/.doctrees/.doc/env.simple_intersection.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/high_level_policy_main.doctree b/documentation/sphinx/.build/.doctrees/.doc/high_level_policy_main.doctree
index e0156dc307cfd77996e42c975f5d492dcd1a6a29..9c1b43bd4cfce1172cccf2584ac28f50703d67c3 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/high_level_policy_main.doctree and b/documentation/sphinx/.build/.doctrees/.doc/high_level_policy_main.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/low_level_policy_main.doctree b/documentation/sphinx/.build/.doctrees/.doc/low_level_policy_main.doctree
index 27c03da8f6d7e8cafc2013d482f5959a95aa467c..162c48e5c31e718862de95e21ee8817878f4e87a 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/low_level_policy_main.doctree and b/documentation/sphinx/.build/.doctrees/.doc/low_level_policy_main.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/mcts.doctree b/documentation/sphinx/.build/.doctrees/.doc/mcts.doctree
index 24b9665115187d8227050d5da781e1825fc98075..c8c70ed601f2089de82dc60e831c827e29575ef4 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/mcts.doctree and b/documentation/sphinx/.build/.doctrees/.doc/mcts.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/model_checker.doctree b/documentation/sphinx/.build/.doctrees/.doc/model_checker.doctree
deleted file mode 100644
index 45f07f2ff35ae85c068394697e5ec736790e15f2..0000000000000000000000000000000000000000
Binary files a/documentation/sphinx/.build/.doctrees/.doc/model_checker.doctree and /dev/null differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/model_checker.simple_intersection.doctree b/documentation/sphinx/.build/.doctrees/.doc/model_checker.simple_intersection.doctree
deleted file mode 100644
index 74fd9cd758553805c2dad3b24459f55ad71b1721..0000000000000000000000000000000000000000
Binary files a/documentation/sphinx/.build/.doctrees/.doc/model_checker.simple_intersection.doctree and /dev/null differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/modules.doctree b/documentation/sphinx/.build/.doctrees/.doc/modules.doctree
index 6f6ea44362e6dff3a75f4d91dbc07609bca81830..a09f777e2294c88c2907d63b6768cffaaee25902 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/modules.doctree and b/documentation/sphinx/.build/.doctrees/.doc/modules.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/options.doctree b/documentation/sphinx/.build/.doctrees/.doc/options.doctree
index 2af45197bb558f31afa26ec4dce5208d05d4482c..39f9d4cff1b3e24c7e2c90e7bdce76d09940aaac 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/options.doctree and b/documentation/sphinx/.build/.doctrees/.doc/options.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/options.simple_intersection.doctree b/documentation/sphinx/.build/.doctrees/.doc/options.simple_intersection.doctree
index 47fb3231e29b9df8dbd224a0eccf1e6e186cd6db..c5f57e9cafb8eb8732b9cab865c17a0ae65fedd5 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/options.simple_intersection.doctree and b/documentation/sphinx/.build/.doctrees/.doc/options.simple_intersection.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/ppo2_training.doctree b/documentation/sphinx/.build/.doctrees/.doc/ppo2_training.doctree
index fc91215a7f084853fd8947d963268c9c39933992..5c59c246587ded5dbaaf4c8b7195f01e503f8ed6 100644
Binary files a/documentation/sphinx/.build/.doctrees/.doc/ppo2_training.doctree and b/documentation/sphinx/.build/.doctrees/.doc/ppo2_training.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/verifier.doctree b/documentation/sphinx/.build/.doctrees/.doc/verifier.doctree
new file mode 100644
index 0000000000000000000000000000000000000000..54b926e90ab037ad4d716b5bfb9fec314b961a33
Binary files /dev/null and b/documentation/sphinx/.build/.doctrees/.doc/verifier.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/.doc/verifier.simple_intersection.doctree b/documentation/sphinx/.build/.doctrees/.doc/verifier.simple_intersection.doctree
new file mode 100644
index 0000000000000000000000000000000000000000..ec805b5108cac054d198f3b37fc1198bc04810e2
Binary files /dev/null and b/documentation/sphinx/.build/.doctrees/.doc/verifier.simple_intersection.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/environment.pickle b/documentation/sphinx/.build/.doctrees/environment.pickle
index 82aca56c04f4313da34b1ce053b9ec0b3552a4ec..336abc7fbb674d816113c213970e71f762251c7e 100644
Binary files a/documentation/sphinx/.build/.doctrees/environment.pickle and b/documentation/sphinx/.build/.doctrees/environment.pickle differ
diff --git a/documentation/sphinx/.build/.doctrees/index.doctree b/documentation/sphinx/.build/.doctrees/index.doctree
index 0c65cbcd9da0ab7884ebdd41f87586e726eaea56..11bc11884c3a85d50c02dbee3f90c5d683fe643a 100644
Binary files a/documentation/sphinx/.build/.doctrees/index.doctree and b/documentation/sphinx/.build/.doctrees/index.doctree differ
diff --git a/documentation/sphinx/.build/.doctrees/usage/config.doctree b/documentation/sphinx/.build/.doctrees/usage/config.doctree
deleted file mode 100644
index 1137b5480589561033102cae42e512c24a409836..0000000000000000000000000000000000000000
Binary files a/documentation/sphinx/.build/.doctrees/usage/config.doctree and /dev/null differ
diff --git a/documentation/sphinx/.build/.doctrees/usage/quickstart.doctree b/documentation/sphinx/.build/.doctrees/usage/quickstart.doctree
index b11cf681cae88029ff5f34ffe2ccb204439c07da..0f030368ea7302356d45eb767bf1647325088283 100644
Binary files a/documentation/sphinx/.build/.doctrees/usage/quickstart.doctree and b/documentation/sphinx/.build/.doctrees/usage/quickstart.doctree differ
diff --git a/documentation/sphinx/.build/_sources/.doc/backends.rst.txt b/documentation/sphinx/.build/_sources/.doc/backends.rst.txt
index 367db99c0322ede126d227162e4437fdcc56330e..f828e4c63893331fae42a8dd5217187dac6c0de7 100644
--- a/documentation/sphinx/.build/_sources/.doc/backends.rst.txt
+++ b/documentation/sphinx/.build/_sources/.doc/backends.rst.txt
@@ -44,18 +44,18 @@ backends.manual\_policy module
:undoc-members:
:show-inheritance:
-backends.mcts\_learner module
------------------------------
+backends.mcts\_controller module
+--------------------------------
-.. automodule:: backends.mcts_learner
+.. automodule:: backends.mcts_controller
:members:
:undoc-members:
:show-inheritance:
-backends.online\_mcts\_controller module
-----------------------------------------
+backends.mcts\_learner module
+-----------------------------
-.. automodule:: backends.online_mcts_controller
+.. automodule:: backends.mcts_learner
:members:
:undoc-members:
:show-inheritance:
diff --git a/documentation/sphinx/.build/_sources/.doc/model_checker.rst.txt b/documentation/sphinx/.build/_sources/.doc/model_checker.rst.txt
deleted file mode 100644
index 01b5acedfaa48cb1e906e5b5379a8bbaac65f699..0000000000000000000000000000000000000000
--- a/documentation/sphinx/.build/_sources/.doc/model_checker.rst.txt
+++ /dev/null
@@ -1,53 +0,0 @@
-model\_checker package
-======================
-
-Subpackages
------------
-
-.. toctree::
-
- model_checker.simple_intersection
-
-Submodules
-----------
-
-model\_checker.LTL\_property\_base module
------------------------------------------
-
-.. automodule:: model_checker.LTL_property_base
- :members:
- :undoc-members:
- :show-inheritance:
-
-model\_checker.atomic\_propositions\_base module
-------------------------------------------------
-
-.. automodule:: model_checker.atomic_propositions_base
- :members:
- :undoc-members:
- :show-inheritance:
-
-model\_checker.parser module
-----------------------------
-
-.. automodule:: model_checker.parser
- :members:
- :undoc-members:
- :show-inheritance:
-
-model\_checker.scanner module
------------------------------
-
-.. automodule:: model_checker.scanner
- :members:
- :undoc-members:
- :show-inheritance:
-
-
-Module contents
----------------
-
-.. automodule:: model_checker
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/documentation/sphinx/.build/_sources/.doc/model_checker.simple_intersection.rst.txt b/documentation/sphinx/.build/_sources/.doc/model_checker.simple_intersection.rst.txt
deleted file mode 100644
index 0859f202f9266745e1455af901512cd85f66d136..0000000000000000000000000000000000000000
--- a/documentation/sphinx/.build/_sources/.doc/model_checker.simple_intersection.rst.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-model\_checker.simple\_intersection package
-===========================================
-
-Submodules
-----------
-
-model\_checker.simple\_intersection.AP\_dict module
----------------------------------------------------
-
-.. automodule:: model_checker.simple_intersection.AP_dict
- :members:
- :undoc-members:
- :show-inheritance:
-
-model\_checker.simple\_intersection.LTL\_test module
-----------------------------------------------------
-
-.. automodule:: model_checker.simple_intersection.LTL_test
- :members:
- :undoc-members:
- :show-inheritance:
-
-model\_checker.simple\_intersection.classes module
---------------------------------------------------
-
-.. automodule:: model_checker.simple_intersection.classes
- :members:
- :undoc-members:
- :show-inheritance:
-
-
-Module contents
----------------
-
-.. automodule:: model_checker.simple_intersection
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/documentation/sphinx/.build/_sources/.doc/modules.rst.txt b/documentation/sphinx/.build/_sources/.doc/modules.rst.txt
index 6f654d3e26c07ee7cd061a2fc00cec3b54426490..78ccb4dced9cadec5b90a7e87de9f6a7bb47394a 100644
--- a/documentation/sphinx/.build/_sources/.doc/modules.rst.txt
+++ b/documentation/sphinx/.build/_sources/.doc/modules.rst.txt
@@ -1,5 +1,5 @@
-wisemove
-========
+wise-move-dev
+=============
.. toctree::
:maxdepth: 4
@@ -9,6 +9,6 @@ wisemove
high_level_policy_main
low_level_policy_main
mcts
- model_checker
options
ppo2_training
+ verifier
diff --git a/documentation/sphinx/.build/_sources/.doc/options.simple_intersection.rst.txt b/documentation/sphinx/.build/_sources/.doc/options.simple_intersection.rst.txt
index 888c74c9861b4dbf579eb7966ab8348165cccde1..3197e0b235c02fe0d28491822ad0d1268dcc132c 100644
--- a/documentation/sphinx/.build/_sources/.doc/options.simple_intersection.rst.txt
+++ b/documentation/sphinx/.build/_sources/.doc/options.simple_intersection.rst.txt
@@ -20,14 +20,6 @@ options.simple\_intersection.maneuvers module
:undoc-members:
:show-inheritance:
-options.simple\_intersection.mcts\_maneuvers module
----------------------------------------------------
-
-.. automodule:: options.simple_intersection.mcts_maneuvers
- :members:
- :undoc-members:
- :show-inheritance:
-
Module contents
---------------
diff --git a/documentation/sphinx/.build/_sources/.doc/verifier.rst.txt b/documentation/sphinx/.build/_sources/.doc/verifier.rst.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a27164aa45743afba531d61750057af815f69ce9
--- /dev/null
+++ b/documentation/sphinx/.build/_sources/.doc/verifier.rst.txt
@@ -0,0 +1,53 @@
+verifier package
+================
+
+Subpackages
+-----------
+
+.. toctree::
+
+ verifier.simple_intersection
+
+Submodules
+----------
+
+verifier.LTL\_property\_base module
+-----------------------------------
+
+.. automodule:: verifier.LTL_property_base
+ :members:
+ :undoc-members:
+ :show-inheritance:
+
+verifier.atomic\_propositions\_base module
+------------------------------------------
+
+.. automodule:: verifier.atomic_propositions_base
+ :members:
+ :undoc-members:
+ :show-inheritance:
+
+verifier.parser module
+----------------------
+
+.. automodule:: verifier.parser
+ :members:
+ :undoc-members:
+ :show-inheritance:
+
+verifier.scanner module
+-----------------------
+
+.. automodule:: verifier.scanner
+ :members:
+ :undoc-members:
+ :show-inheritance:
+
+
+Module contents
+---------------
+
+.. automodule:: verifier
+ :members:
+ :undoc-members:
+ :show-inheritance:
diff --git a/documentation/sphinx/.build/_sources/.doc/verifier.simple_intersection.rst.txt b/documentation/sphinx/.build/_sources/.doc/verifier.simple_intersection.rst.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a90b203ec3128814989b50a7398bebc72fe6e6aa
--- /dev/null
+++ b/documentation/sphinx/.build/_sources/.doc/verifier.simple_intersection.rst.txt
@@ -0,0 +1,38 @@
+verifier.simple\_intersection package
+=====================================
+
+Submodules
+----------
+
+verifier.simple\_intersection.AP\_dict module
+---------------------------------------------
+
+.. automodule:: verifier.simple_intersection.AP_dict
+ :members:
+ :undoc-members:
+ :show-inheritance:
+
+verifier.simple\_intersection.LTL\_test module
+----------------------------------------------
+
+.. automodule:: verifier.simple_intersection.LTL_test
+ :members:
+ :undoc-members:
+ :show-inheritance:
+
+verifier.simple\_intersection.classes module
+--------------------------------------------
+
+.. automodule:: verifier.simple_intersection.classes
+ :members:
+ :undoc-members:
+ :show-inheritance:
+
+
+Module contents
+---------------
+
+.. automodule:: verifier.simple_intersection
+ :members:
+ :undoc-members:
+ :show-inheritance:
diff --git a/documentation/sphinx/.build/_sources/usage/config.rst.txt b/documentation/sphinx/.build/_sources/usage/config.rst.txt
deleted file mode 100644
index 913e769e177b28edf98b4cb3c61bb3b2466b57c6..0000000000000000000000000000000000000000
--- a/documentation/sphinx/.build/_sources/usage/config.rst.txt
+++ /dev/null
@@ -1,5 +0,0 @@
-==================
-Configuration File
-==================
-
-:code:`config.json` can be used to select the maneuvers and high_level policy for fast experimentation.
\ No newline at end of file
diff --git a/documentation/sphinx/.build/_sources/usage/quickstart.rst.txt b/documentation/sphinx/.build/_sources/usage/quickstart.rst.txt
index d1f825abca85a1afb51112e0f6c81bc670ab1d96..5b1e853ab7da26104997dbe6f453bfac4aff7651 100644
--- a/documentation/sphinx/.build/_sources/usage/quickstart.rst.txt
+++ b/documentation/sphinx/.build/_sources/usage/quickstart.rst.txt
@@ -23,7 +23,7 @@ The policy for executing each maneuver is referred to as the low-level policy, w
::
- usage: low_level_policy_main.py [-h] [--train] [--option] [--test]
+ usage: low_level_policy_main.py [-h] [--train] [--option] [--test] [--evaluate]
[--saved_policy_in_root] [--load_weights]
[--tensorboard] [--visualize]
[--nb_steps NB_STEPS]
@@ -31,13 +31,15 @@ The policy for executing each maneuver is referred to as the low-level policy, w
optional arguments:
-h, --help show this help message and exit
- --train Train a high-level policy with default settings.
+ --train Train a low-level policy with default settings.
Always saved in the root folder. Always tests after
training
--option the option to train. Eg. stop, keeplane, wait,
changelane, follow. If not defined, trains all options
- --test Test a saved high-level policy. Uses backends/trained_
- policies/highlevel/highlevel_weights.h5f by default
+ --test Test a saved low-level policy.
+ Uses saved policy in backends/trained_policies/OPTION_NAME/ by default
+ --evaluate Evaluate a saved low-level policy over 100 episodes.
+ Uses saved policy in backends/trained_policies/OPTION_NAME/ by default
--saved_policy_in_root
Use saved policies in the root of the project rather than
backends/trained_policies/highlevel/
@@ -54,7 +56,7 @@ Run :code:`low_level_policy_main.py --train --option=OPTION_NAME`, where OPTION_
::
- python low_level_policy_main.py --train --option=keeplane --visualize
+ python3 low_level_policy_main.py --train --option=keeplane --visualize
Testing
=======
@@ -62,7 +64,16 @@ Run :code:`low_level_policy_main.py --test --option=OPTION_NAME` along with othe
::
- python low_level_policy_main.py --test --option=wait --nb_episodes_for_test=20
+ python3 low_level_policy_main.py --test --option=wait --nb_episodes_for_test=20
+
+Evaluating
+==========
+Run :code:`low_level_policy_main.py --evaluate --option=OPTION_NAME` along with other supported arguments to evaluate the trained policy over 100 episodes. By default, uses the trained policies in `/backends/trained_policies`. For example:
+
+::
+
+ python3 low_level_policy_main.py --evaluate --option=evaluate --saved_policy_in_root
+
=================
High-level Policy
@@ -113,7 +124,7 @@ Run :code:`high_level_policy_main.py --train` along with other supported argumen
::
- high_level_policy_training.py --train --nb_steps=25000 --nb_episodes_for_test=20
+ python3 high_level_policy_training.py --train --nb_steps=25000 --nb_episodes_for_test=20
Testing
=======
@@ -121,17 +132,53 @@ Run :code:`high_level_policy_main.py --test` or :code:`high_level_policy_main.py
::
- high_level_policy_training.py --evaluate --nb_trials=5 --nb_episodes_for_test=20
+ python3 high_level_policy_training.py --evaluate --nb_trials=5 --nb_episodes_for_test=20
+
+=======================
+Monte-Carlo Tree Search
+=======================
+Even after using such a hierarchical structure and learn-time verification, collisions and constraint violations can be inevitable because the policies are never perfect. This framework supports the safe execution of the high-level policy by using an MCTS to look ahead in time and choose paths that do not lead to a collision or a temporal logic violation. `mcts.py` is used to execute the learned policies using MCTS.
+
+::
+
+ usage: mcts.py [-h] [--evaluate] [--test]
+ [--nb_episodes_for_test NB_EPISODES_FOR_TEST] [--visualize]
+ [--depth DEPTH] [--nb_traversals NB_TRAVERSALS]
+ [--nb_episodes NB_EPISODES] [--nb_trials NB_TRIALS] [--debug]
+ [--highlevel_policy_in_root]
+
+ optional arguments:
+ -h, --help show this help message and exit
+ --evaluate Evaluate over n trials, no visualization by default.
+ --test Tests MCTS for 100 episodes by default.
+ --nb_episodes_for_test NB_EPISODES_FOR_TEST
+ Number of episodes to test/evaluate. Default is 100
+ --visualize Visualize the training.
+ --depth DEPTH Max depth of tree per episode. Default is 5
+ --nb_traversals NB_TRAVERSALS
+ Number of traversals to perform per episode. Default
+ is 50
+ --nb_episodes NB_EPISODES
+ Number of episodes per trial to evaluate. Default is
+ 100
+ --nb_trials NB_TRIALS
+ Number of trials to evaluate. Default is 10
+ --debug Show debug output. Default is false
+ --highlevel_policy_in_root
+ Use saved high-level policy in root of project rather
+ than backends/trained_policies/highlevel/
-==============
-Model Checking
-==============
+========
+Verifier
+========
-Both low-level and high-level policies, while being trained, use model checking to ensure that user-defined temporal logic constraints are not violated. The user can define global LTL properties that apply to all maneuvers as well as maneuver-specific LTL constraints.
+The low-level and high-level policies while being trained, and MCTS during execution, are verified using user-defined temporal logic constraints. The user can define global LTL properties that apply to all maneuvers as well as maneuver-specific LTL constraints. They can also choose to provide a negative or positive reward feedback to the agent on violating a constaint to nudge the agent in the right direction.
+
+The global constraints are also checked during run-time of the trained agent. An example of a global constraint would be traffic rule that ensures the vehicle always stops at the stop sign.
Atomic Propositions
===================
-Atomic propositions for the simple_intersection environment are defined as human-readable strings in `model_checker/simple_intersection/AP_dict.py`. These should evaluate to True or False depending on the state of the environment and they need to be updated in every step of the environment. The temporal logic properties are constructed using a combination of atomic propositions and logic operators.
+Atomic propositions (AP) for the simple_intersection environment are defined as human-readable strings in `verifier/simple_intersection/AP_dict.py`. These should evaluate to True or False depending on the state of the environment and they need to be updated in every step of the environment. The temporal logic properties are constructed using a combination of atomic propositions and logic operators. For example, an AP `over_speed_limit` is set to True if the vehicle is above the speed limit.
Linear Temporal Logic
=====================
@@ -162,13 +209,14 @@ Nested temporal operators must be enclosed in parentheses, e.g. G in_stop_region
Note that the arguments of "U" should be predicates over atomic propositions.
-=======================
-Monte-Carlo Tree Search
-=======================
-Even after using such a hierarchical structure and model-checking during learning, collisions and constraint violations can be inevitable because the policies are never perfect. This framework supports the safe execution of the high-level policy by using an MCTS to look ahead in time and choose paths that do not lead to a collision or a temporal logic violation. `mcts.py` is used to execute the learned policies using MCTS. As online execution can be computationally expensive the framework also provides an offline version of MCTS that learns probabilities of safe paths and uses it to enhance trained policies.
-
================
Learning Backend
================
-The framework also supports multiple learning backends and it is easy to add and use other ones as necessary. The KerasRL reinforcement learning framework was used to learn and test policies for the simple-intersection environment using `backends/kerasrl_learner.py`. The stable-baselines library has also been incorporated in `backends/baselines_learner.py`.
\ No newline at end of file
+The framework also supports multiple learning backends and it is easy to add and use other ones as necessary. The KerasRL reinforcement learning framework was used to learn and test policies for the simple-intersection environment using `backends/kerasrl_learner.py`. The stable-baselines library has also been incorporated in `backends/baselines_learner.py`.
+
+==================
+Additional Options
+==================
+
+For investigation, we also provide additional learnable option 'halt', with its pretrained parameters, 'manualwait' not trainable but manually crafted, etc. To use those options, specify those options in config.json along with the other options to be used and follow the instructions above to get the experimental results. This feature makes it able to investigate more of WiseMove autonomous driving framework, with a variety of choices and combinations of the options.
diff --git a/documentation/sphinx/.build/genindex.html b/documentation/sphinx/.build/genindex.html
index 4e9343e85407bee72ecaa20e57ae7d23fe43e81a..86a238ecd3eaa3c18f836efa3f8334fd1adc3aa1 100644
--- a/documentation/sphinx/.build/genindex.html
+++ b/documentation/sphinx/.build/genindex.html
@@ -70,31 +70,29 @@
diff --git a/documentation/sphinx/.build/objects.inv b/documentation/sphinx/.build/objects.inv
index 46806fc1af587d76e16ad36dcbd84db4da68c39a..aa144c308c74ed6a4ce8b60d033ecd6ffbbe36c1 100644
Binary files a/documentation/sphinx/.build/objects.inv and b/documentation/sphinx/.build/objects.inv differ
diff --git a/documentation/sphinx/.build/py-modindex.html b/documentation/sphinx/.build/py-modindex.html
index 5ae8eb173b271898c63720820e98d9f888734a69..ab826d2382f5df0971d46abedbf72c4966a85fee 100644
--- a/documentation/sphinx/.build/py-modindex.html
+++ b/documentation/sphinx/.build/py-modindex.html
@@ -45,7 +45,8 @@
l |
m |
o |
- p
+ p |
+ v
The policy for executing each maneuver is referred to as the low-level policy, which provides the agent with an input depending on the current state. They can be both manually-defined or learned. For the simple-intersection environment, low-level policies learned using reinforcement learning are provided in the framework. Learning is done using low_level_policy_main.py. Use low_level_policy_main.py--help to view supported arguments and defaults.
Run low_level_policy_main.py--train--option=OPTION_NAME, where OPTION_NAME can be the key of any node defined in config.json to learn the option using reinforcement learning default settings and save the result to root folder. If no option is specified, all options are trained. The training can be customized further using other supported arguments. For example, to train keeplane maneuver and visualize the training, run:
Run low_level_policy_main.py--test--option=OPTION_NAME along with other supported arguments to test the trained policy. By default, uses the trained policies in /backends/trained_policies. For example:
Run low_level_policy_main.py--evaluate--option=OPTION_NAME along with other supported arguments to evaluate the trained policy over 100 episodes. By default, uses the trained policies in /backends/trained_policies. For example:
Run high_level_policy_main.py--train along with other supported arguments to train a policy using reinforcement learning default settings. By default, it is saved to the root folder so as not to overwrite already trained policies. For example:
Run high_level_policy_main.py--test or high_level_policy_main.py--evaluate along with other supported arguments to test the trained policy. By default, uses the trained policies in /backends/trained_policies/highlevel. For example:
Both low-level and high-level policies, while being trained, use model checking to ensure that user-defined temporal logic constraints are not violated. The user can define global LTL properties that apply to all maneuvers as well as maneuver-specific LTL constraints.
Even after using such a hierarchical structure and learn-time verification, collisions and constraint violations can be inevitable because the policies are never perfect. This framework supports the safe execution of the high-level policy by using an MCTS to look ahead in time and choose paths that do not lead to a collision or a temporal logic violation. mcts.py is used to execute the learned policies using MCTS.
The low-level and high-level policies while being trained, and MCTS during execution, are verified using user-defined temporal logic constraints. The user can define global LTL properties that apply to all maneuvers as well as maneuver-specific LTL constraints. They can also choose to provide a negative or positive reward feedback to the agent on violating a constaint to nudge the agent in the right direction.
+
The global constraints are also checked during run-time of the trained agent. An example of a global constraint would be traffic rule that ensures the vehicle always stops at the stop sign.
Atomic propositions for the simple_intersection environment are defined as human-readable strings in model_checker/simple_intersection/AP_dict.py. These should evaluate to True or False depending on the state of the environment and they need to be updated in every step of the environment. The temporal logic properties are constructed using a combination of atomic propositions and logic operators.
+
Atomic propositions (AP) for the simple_intersection environment are defined as human-readable strings in verifier/simple_intersection/AP_dict.py. These should evaluate to True or False depending on the state of the environment and they need to be updated in every step of the environment. The temporal logic properties are constructed using a combination of atomic propositions and logic operators. For example, an AP over_speed_limit is set to True if the vehicle is above the speed limit.
Even after using such a hierarchical structure and model-checking during learning, collisions and constraint violations can be inevitable because the policies are never perfect. This framework supports the safe execution of the high-level policy by using an MCTS to look ahead in time and choose paths that do not lead to a collision or a temporal logic violation. mcts.py is used to execute the learned policies using MCTS. As online execution can be computationally expensive the framework also provides an offline version of MCTS that learns probabilities of safe paths and uses it to enhance trained policies.
The framework also supports multiple learning backends and it is easy to add and use other ones as necessary. The KerasRL reinforcement learning framework was used to learn and test policies for the simple-intersection environment using backends/kerasrl_learner.py. The stable-baselines library has also been incorporated in backends/baselines_learner.py.
For investigation, we also provide additional learnable option ‘halt’, with its pretrained parameters, ‘manualwait’ not trainable but manually crafted, etc. To use those options, specify those options in config.json along with the other options to be used and follow the instructions above to get the experimental results. This feature makes it able to investigate more of WiseMove autonomous driving framework, with a variety of choices and combinations of the options.
+
diff --git a/documentation/sphinx/logfile.log b/documentation/sphinx/logfile.log
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..55020e5af0f4b7205f5b86fad3242e146a14fa43 100644
--- a/documentation/sphinx/logfile.log
+++ b/documentation/sphinx/logfile.log
@@ -0,0 +1,270 @@
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+True: 2
+Undecided: 1
+False: 0
+[0, 0, 0, 0, 0, 0, 0, 3]
+incremental result:1
+batch result:1
+
+[0, 0, 0, 0, 0, 0, 0, 3, 2]
+incremental result:0
+batch result:0
+
+[9, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:2
+
+[1, 0, 0, 0, 0, 0, 0]
+incremental result:2
+batch result:0
+
+[8, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
+[0, 0, 0, 0, 0, 0, 0]
+incremental result:0
+batch result:0
+
diff --git a/documentation/sphinx/usage/config.rst b/documentation/sphinx/usage/config.rst
deleted file mode 100644
index 913e769e177b28edf98b4cb3c61bb3b2466b57c6..0000000000000000000000000000000000000000
--- a/documentation/sphinx/usage/config.rst
+++ /dev/null
@@ -1,5 +0,0 @@
-==================
-Configuration File
-==================
-
-:code:`config.json` can be used to select the maneuvers and high_level policy for fast experimentation.
\ No newline at end of file
diff --git a/documentation/sphinx/usage/quickstart.rst b/documentation/sphinx/usage/quickstart.rst
index d1f825abca85a1afb51112e0f6c81bc670ab1d96..5b1e853ab7da26104997dbe6f453bfac4aff7651 100644
--- a/documentation/sphinx/usage/quickstart.rst
+++ b/documentation/sphinx/usage/quickstart.rst
@@ -23,7 +23,7 @@ The policy for executing each maneuver is referred to as the low-level policy, w
::
- usage: low_level_policy_main.py [-h] [--train] [--option] [--test]
+ usage: low_level_policy_main.py [-h] [--train] [--option] [--test] [--evaluate]
[--saved_policy_in_root] [--load_weights]
[--tensorboard] [--visualize]
[--nb_steps NB_STEPS]
@@ -31,13 +31,15 @@ The policy for executing each maneuver is referred to as the low-level policy, w
optional arguments:
-h, --help show this help message and exit
- --train Train a high-level policy with default settings.
+ --train Train a low-level policy with default settings.
Always saved in the root folder. Always tests after
training
--option the option to train. Eg. stop, keeplane, wait,
changelane, follow. If not defined, trains all options
- --test Test a saved high-level policy. Uses backends/trained_
- policies/highlevel/highlevel_weights.h5f by default
+ --test Test a saved low-level policy.
+ Uses saved policy in backends/trained_policies/OPTION_NAME/ by default
+ --evaluate Evaluate a saved low-level policy over 100 episodes.
+ Uses saved policy in backends/trained_policies/OPTION_NAME/ by default
--saved_policy_in_root
Use saved policies in the root of the project rather than
backends/trained_policies/highlevel/
@@ -54,7 +56,7 @@ Run :code:`low_level_policy_main.py --train --option=OPTION_NAME`, where OPTION_
::
- python low_level_policy_main.py --train --option=keeplane --visualize
+ python3 low_level_policy_main.py --train --option=keeplane --visualize
Testing
=======
@@ -62,7 +64,16 @@ Run :code:`low_level_policy_main.py --test --option=OPTION_NAME` along with othe
::
- python low_level_policy_main.py --test --option=wait --nb_episodes_for_test=20
+ python3 low_level_policy_main.py --test --option=wait --nb_episodes_for_test=20
+
+Evaluating
+==========
+Run :code:`low_level_policy_main.py --evaluate --option=OPTION_NAME` along with other supported arguments to evaluate the trained policy over 100 episodes. By default, uses the trained policies in `/backends/trained_policies`. For example:
+
+::
+
+ python3 low_level_policy_main.py --evaluate --option=evaluate --saved_policy_in_root
+
=================
High-level Policy
@@ -113,7 +124,7 @@ Run :code:`high_level_policy_main.py --train` along with other supported argumen
::
- high_level_policy_training.py --train --nb_steps=25000 --nb_episodes_for_test=20
+ python3 high_level_policy_training.py --train --nb_steps=25000 --nb_episodes_for_test=20
Testing
=======
@@ -121,17 +132,53 @@ Run :code:`high_level_policy_main.py --test` or :code:`high_level_policy_main.py
::
- high_level_policy_training.py --evaluate --nb_trials=5 --nb_episodes_for_test=20
+ python3 high_level_policy_training.py --evaluate --nb_trials=5 --nb_episodes_for_test=20
+
+=======================
+Monte-Carlo Tree Search
+=======================
+Even after using such a hierarchical structure and learn-time verification, collisions and constraint violations can be inevitable because the policies are never perfect. This framework supports the safe execution of the high-level policy by using an MCTS to look ahead in time and choose paths that do not lead to a collision or a temporal logic violation. `mcts.py` is used to execute the learned policies using MCTS.
+
+::
+
+ usage: mcts.py [-h] [--evaluate] [--test]
+ [--nb_episodes_for_test NB_EPISODES_FOR_TEST] [--visualize]
+ [--depth DEPTH] [--nb_traversals NB_TRAVERSALS]
+ [--nb_episodes NB_EPISODES] [--nb_trials NB_TRIALS] [--debug]
+ [--highlevel_policy_in_root]
+
+ optional arguments:
+ -h, --help show this help message and exit
+ --evaluate Evaluate over n trials, no visualization by default.
+ --test Tests MCTS for 100 episodes by default.
+ --nb_episodes_for_test NB_EPISODES_FOR_TEST
+ Number of episodes to test/evaluate. Default is 100
+ --visualize Visualize the training.
+ --depth DEPTH Max depth of tree per episode. Default is 5
+ --nb_traversals NB_TRAVERSALS
+ Number of traversals to perform per episode. Default
+ is 50
+ --nb_episodes NB_EPISODES
+ Number of episodes per trial to evaluate. Default is
+ 100
+ --nb_trials NB_TRIALS
+ Number of trials to evaluate. Default is 10
+ --debug Show debug output. Default is false
+ --highlevel_policy_in_root
+ Use saved high-level policy in root of project rather
+ than backends/trained_policies/highlevel/
-==============
-Model Checking
-==============
+========
+Verifier
+========
-Both low-level and high-level policies, while being trained, use model checking to ensure that user-defined temporal logic constraints are not violated. The user can define global LTL properties that apply to all maneuvers as well as maneuver-specific LTL constraints.
+The low-level and high-level policies while being trained, and MCTS during execution, are verified using user-defined temporal logic constraints. The user can define global LTL properties that apply to all maneuvers as well as maneuver-specific LTL constraints. They can also choose to provide a negative or positive reward feedback to the agent on violating a constaint to nudge the agent in the right direction.
+
+The global constraints are also checked during run-time of the trained agent. An example of a global constraint would be traffic rule that ensures the vehicle always stops at the stop sign.
Atomic Propositions
===================
-Atomic propositions for the simple_intersection environment are defined as human-readable strings in `model_checker/simple_intersection/AP_dict.py`. These should evaluate to True or False depending on the state of the environment and they need to be updated in every step of the environment. The temporal logic properties are constructed using a combination of atomic propositions and logic operators.
+Atomic propositions (AP) for the simple_intersection environment are defined as human-readable strings in `verifier/simple_intersection/AP_dict.py`. These should evaluate to True or False depending on the state of the environment and they need to be updated in every step of the environment. The temporal logic properties are constructed using a combination of atomic propositions and logic operators. For example, an AP `over_speed_limit` is set to True if the vehicle is above the speed limit.
Linear Temporal Logic
=====================
@@ -162,13 +209,14 @@ Nested temporal operators must be enclosed in parentheses, e.g. G in_stop_region
Note that the arguments of "U" should be predicates over atomic propositions.
-=======================
-Monte-Carlo Tree Search
-=======================
-Even after using such a hierarchical structure and model-checking during learning, collisions and constraint violations can be inevitable because the policies are never perfect. This framework supports the safe execution of the high-level policy by using an MCTS to look ahead in time and choose paths that do not lead to a collision or a temporal logic violation. `mcts.py` is used to execute the learned policies using MCTS. As online execution can be computationally expensive the framework also provides an offline version of MCTS that learns probabilities of safe paths and uses it to enhance trained policies.
-
================
Learning Backend
================
-The framework also supports multiple learning backends and it is easy to add and use other ones as necessary. The KerasRL reinforcement learning framework was used to learn and test policies for the simple-intersection environment using `backends/kerasrl_learner.py`. The stable-baselines library has also been incorporated in `backends/baselines_learner.py`.
\ No newline at end of file
+The framework also supports multiple learning backends and it is easy to add and use other ones as necessary. The KerasRL reinforcement learning framework was used to learn and test policies for the simple-intersection environment using `backends/kerasrl_learner.py`. The stable-baselines library has also been incorporated in `backends/baselines_learner.py`.
+
+==================
+Additional Options
+==================
+
+For investigation, we also provide additional learnable option 'halt', with its pretrained parameters, 'manualwait' not trainable but manually crafted, etc. To use those options, specify those options in config.json along with the other options to be used and follow the instructions above to get the experimental results. This feature makes it able to investigate more of WiseMove autonomous driving framework, with a variety of choices and combinations of the options.
diff --git a/high_level_policy_main.py b/high_level_policy_main.py
index 5d74544eab678708a41ffffdec048210e1ea7f9f..e24b8e96f6f57fb6247eb99c2f1b2ee0c76fcc45 100644
--- a/high_level_policy_main.py
+++ b/high_level_policy_main.py
@@ -196,8 +196,8 @@ if __name__ == "__main__":
action="store_true")
parser.add_argument(
"--nb_steps",
- help="Number of steps to train for. Default is 25000",
- default=25000,
+ help="Number of steps to train for. Default is 200000",
+ default=200000,
type=int)
parser.add_argument(
"--nb_episodes_for_test",
diff --git a/low_level_policy_main.py b/low_level_policy_main.py
index c6dd33599e88eef64d4ac39f49048dd34a478bba..d5cded2ef0e8e3f1c7dd092dce5fbfb051aa12b1 100644
--- a/low_level_policy_main.py
+++ b/low_level_policy_main.py
@@ -2,10 +2,45 @@ from env.simple_intersection import SimpleIntersectionEnv
from env.simple_intersection.constants import *
from options.options_loader import OptionsGraph
from backends.kerasrl_learner import DDPGLearner
+from rl.callbacks import Callback
import argparse
+class ManeuverEvaluateCallback(Callback):
+ def __init__(self, maneuver):
+ self.low_reward_count = 0
+ self.mid_reward_count = 0
+ self.high_reward_count = 0
+ self.maneuver = maneuver
+ super().__init__()
+
+ def on_episode_end(self, episode, logs={}):
+ """Called at end of each episode"""
+ if logs['episode_reward'] < -150:
+ self.low_reward_count += 1
+ elif logs['episode_reward'] > 150:
+ self.high_reward_count += 1
+ else:
+ self.mid_reward_count += 1
+
+ super().on_episode_end(episode, logs)
+
+ def on_train_end(self, logs=None):
+ print("\nThe total # of episode: " + str(self.low_reward_count +
+ self.mid_reward_count +
+ self.high_reward_count))
+ print(" # of episode with reward < -150: " + str(self.low_reward_count) +
+ "\n # of episode with -150 <= reward <= 150: " + str(self.mid_reward_count) +
+ "\n # of episode with reward > 150: " + str(self.high_reward_count))
+
+ success_count = \
+ self.mid_reward_count if self.maneuver == 'follow' else \
+ self.high_reward_count
+
+ print("\n # of success episode: " + str(success_count) + '\n')
+
+
# TODO: make a separate file for this function.
def low_level_policy_training(maneuver,
nb_steps,
@@ -15,7 +50,8 @@ def low_level_policy_training(maneuver,
testing=True,
visualize=False,
nb_episodes_for_test=10,
- tensorboard=False):
+ tensorboard=False,
+ without_ltl=False):
"""Do RL of the low-level policy of the given maneuver and test it.
Args:
@@ -54,6 +90,12 @@ def low_level_policy_training(maneuver,
options.set_current_node(maneuver)
options.current_node.reset()
+ # TODO: make this into a training/testing flag in optionsloader?
+ if without_ltl:
+ options.current_node._enable_low_level_training_properties = False
+ else:
+ options.current_node._enable_low_level_training_properties = True
+
# TODO: add PPO2 case.
# Use this code when you train a specific maneuver for the first time.
agent = DDPGLearner(
@@ -88,6 +130,7 @@ def low_level_policy_training(maneuver,
def low_level_policy_testing(maneuver,
pretrained=False,
+ visualize=True,
nb_episodes_for_test=20):
# initialize the numpy random number generator
@@ -113,7 +156,19 @@ def low_level_policy_testing(maneuver,
agent.load_model(maneuver + "_weights.h5f")
options.current_node.learning_mode = 'testing'
- agent.test_model(options.current_node, nb_episodes=nb_episodes_for_test)
+ agent.test_model(options.current_node,
+ nb_episodes=nb_episodes_for_test,
+ callbacks=[ManeuverEvaluateCallback(maneuver)],
+ visualize=visualize)
+
+
+def evaluate_low_level_policy(maneuver,
+ pretrained=False,
+ nb_episodes_for_eval=100):
+
+ low_level_policy_testing(maneuver, pretrained,
+ nb_episodes_for_test=nb_episodes_for_eval,
+ visualize=False)
if __name__ == "__main__":
@@ -121,18 +176,31 @@ if __name__ == "__main__":
parser.add_argument(
"--train",
help=
- "Train a high level policy with default settings. Always saved in root folder. Always tests after training",
+ "Train a low level policy with default settings. Always saved in root folder. Always tests after training",
action="store_true")
parser.add_argument(
"--option",
help=
- "the option to train. Eg. stop, keeplane, wait, changelane, follow. If not defined, trains all options"
+ "the option to train. Eg. stop, keeplane, wait, changelane, follow. If not defined, trains all of the five options"
)
parser.add_argument(
"--test",
help=
- "Test a saved high level policy. Uses saved policy in backends/trained_policies/OPTION_NAME/ by default",
+ "Test a saved low level policy. Uses saved policy in backends/trained_policies/OPTION_NAME/ by default",
+ action="store_true")
+
+ parser.add_argument(
+ "--without_additional_ltl_properties",
+ help=
+ "Train a low level policy without additional LTL constraints.",
action="store_true")
+
+ parser.add_argument(
+ "--evaluate",
+ help="Evaluate a saved low level policy over 100 episodes. "
+ "Uses backends/trained_policies/highlevel/highlevel_weights.h5f by default",
+ action="store_true")
+
parser.add_argument(
"--saved_policy_in_root",
help=
@@ -157,8 +225,8 @@ if __name__ == "__main__":
type=int)
parser.add_argument(
"--nb_episodes_for_test",
- help="Number of episodes to test. Default is 20",
- default=20,
+ help="Number of episodes to test. Default is 10",
+ default=10,
type=int)
args = parser.parse_args()
@@ -176,7 +244,8 @@ if __name__ == "__main__":
nb_steps=args.nb_steps,
nb_episodes_for_test=args.nb_episodes_for_test,
visualize=args.visualize,
- tensorboard=args.tensorboard)
+ tensorboard=args.tensorboard,
+ without_ltl=args.without_additional_ltl_properties)
else:
for option_key in options.maneuvers.keys():
print("Training {} maneuver...".format(option_key))
@@ -186,7 +255,8 @@ if __name__ == "__main__":
nb_steps=args.nb_steps,
nb_episodes_for_test=args.nb_episodes_for_test,
visualize=args.visualize,
- tensorboard=args.tensorboard)
+ tensorboard=args.tensorboard,
+ without_ltl=args.without_additional_ltl_properties)
if args.test:
if args.option:
@@ -194,11 +264,26 @@ if __name__ == "__main__":
low_level_policy_testing(
args.option,
pretrained=not args.saved_policy_in_root,
+ visualize=True,
nb_episodes_for_test=args.nb_episodes_for_test)
else:
for option_key in options.maneuvers.keys():
print("Testing {} maneuver...".format(option_key))
low_level_policy_testing(
- args.option,
+ option_key,
pretrained=not args.saved_policy_in_root,
+ visualize=True,
nb_episodes_for_test=args.nb_episodes_for_test)
+
+ if args.evaluate:
+ if args.option:
+ print("Evaluating {} maneuver...".format(args.option))
+ evaluate_low_level_policy(
+ args.option,
+ pretrained=not args.saved_policy_in_root)
+ else:
+ for option_key in options.maneuvers.keys():
+ print("Evaluating {} maneuver...".format(option_key))
+ evaluate_low_level_policy(
+ option_key,
+ pretrained=not args.saved_policy_in_root)
diff --git a/mcts.py b/mcts.py
index 2020834acf2fff9b5a4caafa8e4475e6c6a88d0f..021e37a685fe95280c529463d38fa34269096e38 100644
--- a/mcts.py
+++ b/mcts.py
@@ -33,7 +33,9 @@ def mcts_evaluation(depth,
nb_episodes,
nb_trials,
visualize=False,
- debug=False):
+ debug=False,
+ pretrained=True,
+ highlevel_policy_file="highlevel_weights.h5f"):
"""Do RL of the low-level policy of the given maneuver and test it.
Args:
@@ -58,8 +60,10 @@ def mcts_evaluation(depth,
input_shape=(50, ),
nb_actions=options.get_number_of_nodes(),
low_level_policies=options.maneuvers)
- agent.load_model(
- "backends/trained_policies/highlevel/highlevel_weights.h5f")
+
+ if pretrained:
+ highlevel_policy_file = "backends/trained_policies/highlevel/" + highlevel_policy_file
+ agent.load_model(highlevel_policy_file)
# set predictor
options.set_controller_args(
@@ -69,14 +73,15 @@ def mcts_evaluation(depth,
debug=debug)
# Evaluate
- success_list = []
print("\nConducting {} trials of {} episodes each".format(
nb_trials, nb_episodes))
overall_reward_list = []
overall_success_accuracy = []
+ overall_termination_reason_list = {}
for num_tr in range(nb_trials):
num_successes = 0
reward_list = []
+ trial_termination_reason_counter = {}
for num_ep in range(nb_episodes):
init_obs = options.reset()
episode_reward = 0
@@ -93,6 +98,13 @@ def mcts_evaluation(depth,
# print('Intermediate Reward: %f (ego x = %f)' %
# (R, options.env.vehs[0].x))
# print('')
+ if terminal:
+ if 'episode_termination_reason' in info:
+ termination_reason = info['episode_termination_reason']
+ if termination_reason in trial_termination_reason_counter:
+ trial_termination_reason_counter[termination_reason] += 1
+ else:
+ trial_termination_reason_counter[termination_reason] = 1
if options.controller.can_transition():
options.controller.do_transition()
end_time = time.time()
@@ -102,21 +114,52 @@ def mcts_evaluation(depth,
print('Episode {}: Reward = {} ({})'.format(num_ep, episode_reward,
datetime.timedelta(seconds=total_time)))
reward_list += [episode_reward]
- print("Trial {}: Reward = (Avg: {}, Std: {}), Successes: {}/{}".\
+
+ for reason, count in trial_termination_reason_counter.items():
+ if reason in overall_termination_reason_list:
+ overall_termination_reason_list[reason].append(count)
+ else:
+ overall_termination_reason_list[reason] = [count]
+
+ print("\nTrial {}: Reward = (Avg: {}, Std: {}), Successes: {}/{}".\
format(num_tr, np.mean(reward_list), np.std(reward_list), \
num_successes, nb_episodes))
+ print("Trial {} Termination reason(s):".format(num_tr))
+ for reason, count_list in trial_termination_reason_counter.items():
+ count_list = np.array(count_list)
+ print("{}: Avg: {}, Std: {}".format(reason, np.mean(count_list),
+ np.std(count_list)))
+ print("\n")
+
overall_reward_list += reward_list
overall_success_accuracy += [num_successes * 1.0 / nb_episodes]
- print('Overall: Reward = (Avg: {}, Std: {}), Success = (Avg: {}, Std: {})'.\
+
+ print("===========================")
+ print('Overall: Reward = (Avg: {}, Std: {}), Success = (Avg: {}, Std: {})\n'.\
format(np.mean(overall_reward_list), np.std(overall_reward_list),
np.mean(overall_success_accuracy), np.std(overall_success_accuracy)))
+ print("Termination reason(s):")
+ for reason, count_list in overall_termination_reason_list.items():
+ count_list = np.array(count_list)
+ print("{}: Avg: {}, Std: {}".format(reason, np.mean(count_list),
+ np.std(count_list)))
+
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--evaluate",
help="Evaluate over n trials, no visualization by default.",
action="store_true")
+ parser.add_argument(
+ "--test",
+ help="Tests MCTS for 100 episodes by default.",
+ action="store_true")
+ parser.add_argument(
+ "--nb_episodes_for_test",
+ help="Number of episodes to test/evaluate. Default is 100",
+ default=10,
+ type=int)
parser.add_argument(
"--visualize",
help=
@@ -124,28 +167,33 @@ if __name__ == "__main__":
action="store_true")
parser.add_argument(
"--depth",
- help="Max depth of tree per episode. Default is 10",
- default=10,
+ help="Max depth of tree per episode. Default is 5",
+ default=5,
type=int)
parser.add_argument(
"--nb_traversals",
- help="Number of traversals to perform per episode. Default is 100",
- default=100,
+ help="Number of traversals to perform per episode. Default is 50",
+ default=50,
type=int)
parser.add_argument(
"--nb_episodes",
- help="Number of episodes per trial to evaluate. Default is 10",
- default=10,
+ help="Number of episodes per trial to evaluate. Default is 100",
+ default=100,
type=int)
parser.add_argument(
"--nb_trials",
- help="Number of trials to evaluate. Default is 1",
- default=1,
+ help="Number of trials to evaluate. Default is 10",
+ default=10,
type=int)
parser.add_argument(
"--debug",
help="Show debug output. Default is false",
action="store_true")
+ parser.add_argument(
+ "--highlevel_policy_in_root",
+ help=
+ "Use saved high-level policy in root of project rather than backends/trained_policies/highlevel/",
+ action="store_true")
args = parser.parse_args()
@@ -156,4 +204,15 @@ if __name__ == "__main__":
nb_episodes=args.nb_episodes,
nb_trials=args.nb_trials,
visualize=args.visualize,
- debug=args.debug)
+ debug=args.debug,
+ pretrained=not args.highlevel_policy_in_root)
+ elif args.test:
+ mcts_evaluation(
+ depth=args.depth,
+ nb_traversals=args.nb_traversals,
+ nb_episodes=args.nb_episodes,
+ nb_trials=1,
+ visualize=True,
+ debug=args.debug,
+ pretrained=not args.highlevel_policy_in_root)
+
diff --git a/options/simple_intersection/maneuvers.py b/options/simple_intersection/maneuvers.py
index 2902949f12657e222144fde055db290ce800f88c..b532dcc9181458368fada05c18edc9d2f7bb03f1 100644
--- a/options/simple_intersection/maneuvers.py
+++ b/options/simple_intersection/maneuvers.py
@@ -8,14 +8,6 @@ import numpy as np
# TODO: separate out into different files?? is it really needed?
-"""
-enable_additional_properties=
- True means the maneuver uses the additional properties,
- False means not.
-"""
-enable_additional_properties = False
-
-
class KeepLane(ManeuverBase):
def _init_param(self):
self._v_ref = rd.speed_limit
@@ -42,8 +34,6 @@ class KeepLane(ManeuverBase):
# the goal reward and termination is led by the SimpleIntersectionEnv
self.env._terminate_in_goal = False
self.env._reward_in_goal = None
- global enable_additional_properties
- self._enable_low_level_training_properties = enable_additional_properties
self._extra_action_weights_flag = True
def generate_validation_scenario(self):
@@ -124,8 +114,6 @@ class Halt(ManeuverBase):
self.env._terminate_in_goal = False
self.env._reward_in_goal = None
self._reward_in_goal = 200
- global enable_additional_properties
- self._enable_low_level_training_properties = enable_additional_properties
self._extra_action_weights_flag = True
def generate_validation_scenario(self):
@@ -214,8 +202,6 @@ class Stop(ManeuverBase):
ego_heading_towards_lane_centre=True)
self._reward_in_goal = 200
self._penalty_in_violation = 150
- global enable_additional_properties
- self._enable_low_level_training_properties = enable_additional_properties
self._extra_action_weights_flag = True
def _low_level_manual_policy(self):
@@ -316,8 +302,6 @@ class Wait(ManeuverBase):
self.env.init_APs(False)
self.env._terminate_in_goal = False
self._reward_in_goal = 200
- global enable_additional_properties
- self._enable_low_level_training_properties = enable_additional_properties
self._extra_action_weights_flag = False
@property
@@ -467,8 +451,6 @@ class ChangeLane(ManeuverBase):
# print('our range was %s, %s, ego at %s' % (before_intersection, after_intersection, self.env.ego.x))
self._reward_in_goal = 200
self._violation_penalty_in_low_level_training = 150
- global enable_additional_properties
- self._enable_low_level_training_properties = enable_additional_properties
self._extra_action_weights_flag = True
self.env._terminate_in_goal = False
@@ -532,8 +514,6 @@ class Follow(ManeuverBase):
self.env._terminate_in_goal = False
self._penalty_for_out_of_range = 200
self._penalty_for_change_lane = 200
- global enable_additional_properties
- self._enable_low_level_training_properties = enable_additional_properties
self._extra_action_weights_flag = True
def _init_param(self):
diff --git a/scripts/install_dependencies.sh b/scripts/install_dependencies.sh
index ace347ccbfa7b71a33f94360d473d496b3ab3f59..d0b7bd33ed409b3d24cf6363214dd412b3b1b06b 100644
--- a/scripts/install_dependencies.sh
+++ b/scripts/install_dependencies.sh
@@ -21,9 +21,9 @@ fi
# Install packages in requirements.txt
if [ -d "$DEPENDENCY_DIRECTORY" ]; then
- pip3 install --no-index --find-links=$PIP_PACKAGE_FOLDER setuptools
- pip3 install --no-index --find-links=$PIP_PACKAGE_FOLDER wheel
- pip3 install --no-index --find-links=$PIP_PACKAGE_FOLDER -r $REQUIREMENTS_FILE
+ pip3 install --user --no-index --find-links=$PIP_PACKAGE_FOLDER setuptools
+ pip3 install --user --no-index --find-links=$PIP_PACKAGE_FOLDER wheel
+ pip3 install --user --no-index --find-links=$PIP_PACKAGE_FOLDER -r $REQUIREMENTS_FILE
else
pip3 install -r $REQUIREMENTS_FILE --user
fi