Commit 6a7cd92a authored by Aravind Balakrishnan's avatar Aravind Balakrishnan

Misc updates to docs and scripts

* Updated readme
* Updated Sphinx docs
* Updated scripts
* Built latest docs
* Updated low level policies
parent 7a31ba33
WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and model-checking using temporal logic constraints.
WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and safety verification using temporal logic constraints.
Requirements
============
......@@ -13,9 +13,9 @@ Installation
* Run the install dependencies script: `./scripts/install_dependencies.sh` to install pip3 and required python packages.
Note: The script checks if dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder,
else will install required packages from the internet. If you do not have an internet connection and the dependencies folder does not exist,
you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder.
Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, else will install required packages from the internet.
If you do not have an internet connection and the dependencies folder does not exist, you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder.
Documentation
=============
......@@ -25,28 +25,65 @@ Documentation
Replicate Results
=================
These are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
* Run `./scripts/install_dependencies.sh` to install python dependencies.
Given below are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
* Open terminal and navigate to the root of the project directory.
* Low-level policies:
* You can choose to train and test all the maneuvers. But this may take some time and is not recommended.
* To train all low-level policies from scratch: `python3 low_level_policy_main.py --train`. This may take some time.
* To test all these trained low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
* Make sure the training is fully complete before running above test.
* It is easier to verify few of the maneuvers using below commands:
* To train a single low-level, for example wait: `python3 low_level_policy_main.py --option=wait --train`.
* To test one of these trained low-level policies, for example wait: `python3 low_level_policy_main.py --option=wait --test --saved_policy_in_root`
* Available maneuvers are: wait, changelane, stop, keeplane, follow
* These results are visually evaluated.
* Note: This training has a high variance issue due to the continuous action space, especially for stop and keeplane maneuvers. It may help to train for 0.2 million steps than the default 0.1 million by adding argument '--nb_steps=200000' while training.
* Use `python3 low_level_policy_main.py --help` to see all available commands.
* You can choose to test the provided pre-trained options:
* To visually inspect all pre-trained options: `python3 low_level_policy_main.py --test`
* To evaluate all pre-trained options: `python3 low_level_policy_main.py --evaluate`
* To visually inspect a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --test`.
* To evaluate a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --evaluate`.
* Available options are: wait, changelane, stop, keeplane, follow
* Or, you can train and test all the options. But this may take some time. Newly trained policies are saved to the root folder by default.
* To train all low-level policies from scratch (~40 minutes): `python3 low_level_policy_main.py --train`.
* To visually inspect all these new low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
* To evaluate all these new low-level policies: `python3 low_level_policy_main.py --evaluate --saved_policy_in_root`.
* Make sure the training is fully complete before running above test/evaluation.
* It is faster to verify the training of a few options using below commands (**Recommended**):
* To train a single low-level, for example, *changelane* (~6 minutes): `python3 low_level_policy_main.py --option=changelane --train`. This is saved to the root folder.
* To evaluate one of these new low-level policies, for example *changelane*: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* Available options are: wait, changelane, stop, keeplane, follow
* **To replicate the experiments without additional properties:**
* Note that we have not provided a pre-trained policy that is trained without additional LTL.
* You will need to train it by adding the argument `--without_additional_ltl_properties` to the above *training* procedures. For example, `python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties`
* Now, use `--evaluate` to evaluate this new policy: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* **The results of `--evaluate` here is one trial.** In the experiments reported in the paper, we conduct multiple such trials.
* High-level policy:
* To train high-level policy from scratch using the given low-level policies: `python3 high_level_policy_main.py --train`
* To evaluate this trained high-level policy: `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
* The success average and standard deviation corresponds to the result from high-level policy experiments.
* To run MCTS using the high-level policy:
* To obtain a probabilites tree and save it: `python3 mcts.py --train`
* To evaluate using this saved tree: `python3 mcts.py --evaluate --saved_policy_in_root`.
* The success average and standard deviation corresponds to the results from MCTS experiments.
* Use `python3 high_level_policy_main.py --help` to see all available commands.
* You can use the provided pre-trained high-level policy:
* To visually inspect this policy: `python3 high_level_policy_main.py --test`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate`
* Or, you can train the high-level policy from scratch (Note that this takes some time):
* To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): `python3 high_level_policy_main.py --train`
* To visually inspect this new policy: `python3 high_level_policy_main.py --test --saved_policy_in_root`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
* Since above training takes a long time, you can instead verify using a lower number of steps:
* To train for 0.1 million steps (~25 minutes): `python3 high_level_policy_main.py --train --nb_steps=100000`
* Note that this has a much lower success rate of ~75%. So using this for MCTS will not give reported results.
* The success average and standard deviation in evaluation corresponds to the result from high-level policy experiments.
* MCTS:
* Use `python3 mcts.py --help` to see all available commands.
* You can run MCTS on the provided pre-trained high-level policy:
* To visually inspect MCTS on the pre-trained policy: `python3 mcts.py --test --nb_episodes=10`
* To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate`. Note that this takes a very long time (~16 hours).
* For a shorter version of the experiment: `python3 mcts.py --evaluate --nb_trials=2 --nb_episodes=10` (~20 minutes)
* Or, if you have trained a high-level policy from scratch, you can run MCTS on it:
* To visually inspect MCTS on the new policy: `python3 mcts.py --test --highlevel_policy_in_root --nb_episodes=10`
* To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate --highlevel_policy_in_root`. Note that this takes a very long time (~16 hours).
* For a shorter version of the experiment: `python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10` (~20 minutes)
* You can use the arguments `--depth` and `--nb_traversals` to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50).
* The success average and standard deviation in the evaluation corresponds to the results from MCTS experiments.
The time taken to execute above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs:
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
16GB memory
Nvidia GeForce GTX 1080 Ti
Ubuntu 16.04
Coding Standards
================
......
......@@ -206,11 +206,13 @@ class DDPGLearner(LearnerBase):
def test_model(self,
env,
nb_episodes=50,
callbacks=None,
visualize=True,
nb_max_episode_steps=200):
self.agent_model.test(
env,
nb_episodes=nb_episodes,
callbacks=callbacks,
visualize=visualize,
nb_max_episode_steps=nb_max_episode_steps)
......
......@@ -67,7 +67,8 @@
<dl class="method">
<dt id="env.env_base.EpisodicEnvBase.current_model_checking_result">
<code class="descname">current_model_checking_result</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#env.env_base.EpisodicEnvBase.current_model_checking_result" title="Permalink to this definition"></a></dt>
<dd><p>Returns whether or not any of the conditions is currently violated.</p>
<dd><p>Returns whether or not any of the conditions is currently
violated.</p>
</dd></dl>
<dl class="method">
......@@ -97,20 +98,20 @@ it (i.e., goal_achieved is always False).</p>
<dl class="method">
<dt id="env.env_base.EpisodicEnvBase.step">
<code class="descname">step</code><span class="sig-paren">(</span><em>u</em><span class="sig-paren">)</span><a class="headerlink" href="#env.env_base.EpisodicEnvBase.step" title="Permalink to this definition"></a></dt>
<dd><p>Gym compliant step function which
will be implemented in the subclass.</p>
<dd><p>Gym compliant step function which will be implemented in the
subclass.</p>
</dd></dl>
<dl class="attribute">
<dt id="env.env_base.EpisodicEnvBase.terminal_reward_type">
<code class="descname">terminal_reward_type</code><em class="property"> = 'max'</em><a class="headerlink" href="#env.env_base.EpisodicEnvBase.terminal_reward_type" title="Permalink to this definition"></a></dt>
<code class="descname">terminal_reward_type</code><em class="property"> = 'min'</em><a class="headerlink" href="#env.env_base.EpisodicEnvBase.terminal_reward_type" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="attribute">
<dt id="env.env_base.EpisodicEnvBase.termination_condition">
<code class="descname">termination_condition</code><a class="headerlink" href="#env.env_base.EpisodicEnvBase.termination_condition" title="Permalink to this definition"></a></dt>
<dd><p>In the subclass, specify the condition for termination of the episode
(or the maneuver).</p>
<dd><p>In the subclass, specify the condition for termination of the
episode (or the maneuver).</p>
</dd></dl>
<dl class="attribute">
......@@ -127,22 +128,22 @@ will be implemented in the subclass.</p>
<dl class="method">
<dt id="env.env_base.GymCompliantEnvBase.render">
<code class="descname">render</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#env.env_base.GymCompliantEnvBase.render" title="Permalink to this definition"></a></dt>
<dd><p>Gym compliant step function which
will be implemented in the subclass.</p>
<dd><p>Gym compliant step function which will be implemented in the
subclass.</p>
</dd></dl>
<dl class="method">
<dt id="env.env_base.GymCompliantEnvBase.reset">
<code class="descname">reset</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#env.env_base.GymCompliantEnvBase.reset" title="Permalink to this definition"></a></dt>
<dd><p>Gym compliant reset function which
will be implemented in the subclass.</p>
<dd><p>Gym compliant reset function which will be implemented in the
subclass.</p>
</dd></dl>
<dl class="method">
<dt id="env.env_base.GymCompliantEnvBase.step">
<code class="descname">step</code><span class="sig-paren">(</span><em>action</em><span class="sig-paren">)</span><a class="headerlink" href="#env.env_base.GymCompliantEnvBase.step" title="Permalink to this definition"></a></dt>
<dd><p>Gym compliant step function which
will be implemented in the subclass.</p>
<dd><p>Gym compliant step function which will be implemented in the
subclass.</p>
</dd></dl>
</dd></dl>
......@@ -154,7 +155,7 @@ will be implemented in the subclass.</p>
<dt id="env.road_env.RoadEnv">
<em class="property">class </em><code class="descclassname">env.road_env.</code><code class="descname">RoadEnv</code><a class="headerlink" href="#env.road_env.RoadEnv" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
<p>The generic road env</p>
<p>The generic road env.</p>
<p>TODO: Implement this generic road env for plugging-in other road envs.
TODO: roadEnv also having a step() function can cause a problem.</p>
</dd></dl>
......
......@@ -175,7 +175,7 @@
<dl class="method">
<dt id="env.simple_intersection.features.Features.get_features_tuple">
<code class="descname">get_features_tuple</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#env.simple_intersection.features.Features.get_features_tuple" title="Permalink to this definition"></a></dt>
<dd><p>continuous + discrete features</p>
<dd><p>continuous + discrete features.</p>
</dd></dl>
<dl class="method">
......@@ -391,10 +391,8 @@ boxes Checks all vehicles other than ego.</p>
<dl class="method">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.cost">
<code class="descname">cost</code><span class="sig-paren">(</span><em>u</em><span class="sig-paren">)</span><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.cost" title="Permalink to this definition"></a></dt>
<dd><dl class="docutils">
<dt>Calculate the driving cost of the ego, i.e.,</dt>
<dd>the negative reward for the ego-driving.</dd>
</dl>
<dd><p>Calculate the driving cost of the ego, i.e., the negative reward for
the ego-driving.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......@@ -418,11 +416,6 @@ boxes Checks all vehicles other than ego.</p>
<code class="descname">cost_normalization_ranges</code><em class="property"> = [[-3.5, 3.5], [-6.2, 6.2], [-10.081044753473748, -10.081044753473748], [-12, 12], [0, 12], [-2, 2], [-16, 16], [-2, 2]]</em><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.cost_normalization_ranges" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="attribute">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.cost_weights">
<code class="descname">cost_weights</code><em class="property"> = (1.0, 0.25, 0.1, 1.0, 100.0, 0.1, 0.25, 0.1)</em><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.cost_weights" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="method">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.draw">
<code class="descname">draw</code><span class="sig-paren">(</span><em>info=None</em><span class="sig-paren">)</span><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.draw" title="Permalink to this definition"></a></dt>
......@@ -452,7 +445,7 @@ information as passed on by info.</p>
<dl class="method">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.generate_scenario">
<code class="descname">generate_scenario</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.generate_scenario" title="Permalink to this definition"></a></dt>
<dd><p>Randomly generate a road scenario with</p>
<dd><p>Randomly generate a road scenario with.</p>
<blockquote>
<div>“the N-number of vehicles + an ego vehicle”</div></blockquote>
<p>(n_others_range[0] &lt;= N &lt;= n_others_range[1]),
......@@ -563,17 +556,18 @@ when there is no vehicle ahead.</td>
<dl class="method">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.get_features_tuple">
<code class="descname">get_features_tuple</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.get_features_tuple" title="Permalink to this definition"></a></dt>
<dd><p>Get/calculate the features wrt. the current state variables</p>
<dd><p>Get/calculate the features wrt. the current state variables.</p>
<p>Returns features tuple</p>
</dd></dl>
<dl class="attribute">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.goal_achieved">
<code class="descname">goal_achieved</code><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.goal_achieved" title="Permalink to this definition"></a></dt>
<dd><p>A property from the base class which is True if the goal
of the road scenario is achieved, otherwise False. This property is
used in both step of EpisodicEnvBase and the implementation of
the high-level reinforcement learning and execution.</p>
<dd><p>A property from the base class which is True if the goal of the road
scenario is achieved, otherwise False.</p>
<p>This property is used in both step of EpisodicEnvBase and the
implementation of the high-level reinforcement learning and
execution.</p>
</dd></dl>
<dl class="method">
......@@ -629,6 +623,11 @@ required to be initialzied.</td>
<dd><p>the number of the other vehicles, initialized in generate_scenario</p>
</dd></dl>
<dl class="attribute">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.n_others_with_higher_priority">
<code class="descname">n_others_with_higher_priority</code><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.n_others_with_higher_priority" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="attribute">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.normalize_cost">
<code class="descname">normalize_cost</code><em class="property"> = False</em><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.normalize_cost" title="Permalink to this definition"></a></dt>
......@@ -637,8 +636,9 @@ required to be initialzied.</td>
<dl class="method">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.normalize_tuple">
<code class="descname">normalize_tuple</code><span class="sig-paren">(</span><em>vec</em>, <em>scale_factor=10</em><span class="sig-paren">)</span><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.normalize_tuple" title="Permalink to this definition"></a></dt>
<dd><p>Normalizes each element in a tuple according to ranges defined in self.cost_normalization_ranges.
Normalizes between 0 and 1. And the scales by scale_factor</p>
<dd><p>Normalizes each element in a tuple according to ranges defined in
self.cost_normalization_ranges. Normalizes between 0 and 1. And the
scales by scale_factor.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......@@ -722,8 +722,8 @@ info: information variables</td>
<dl class="attribute">
<dt id="env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.termination_condition">
<code class="descname">termination_condition</code><a class="headerlink" href="#env.simple_intersection.simple_intersection_env.SimpleIntersectionEnv.termination_condition" title="Permalink to this definition"></a></dt>
<dd><p>In the subclass, specify the condition for termination of the episode
(or the maneuver).</p>
<dd><p>In the subclass, specify the condition for termination of the
episode (or the maneuver).</p>
</dd></dl>
<dl class="method">
......@@ -909,10 +909,8 @@ returns 0 if it is on the line</p>
<dl class="function">
<dt id="env.simple_intersection.utilities.calculate_s">
<code class="descclassname">env.simple_intersection.utilities.</code><code class="descname">calculate_s</code><span class="sig-paren">(</span><em>v</em><span class="sig-paren">)</span><a class="headerlink" href="#env.simple_intersection.utilities.calculate_s" title="Permalink to this definition"></a></dt>
<dd><dl class="docutils">
<dt>Calculate the distance traveling when the vehicle is maximally</dt>
<dd>de-accelerating for a complete stop.</dd>
</dl>
<dd><p>Calculate the distance traveling when the vehicle is maximally de-
accelerating for a complete stop.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......@@ -929,7 +927,8 @@ de-acceleration to stop completely.</td>
<dl class="function">
<dt id="env.simple_intersection.utilities.calculate_v_max">
<code class="descclassname">env.simple_intersection.utilities.</code><code class="descname">calculate_v_max</code><span class="sig-paren">(</span><em>dist</em><span class="sig-paren">)</span><a class="headerlink" href="#env.simple_intersection.utilities.calculate_v_max" title="Permalink to this definition"></a></dt>
<dd><p>Calculate the maximum velocity you can reach at the given position ahead.</p>
<dd><p>Calculate the maximum velocity you can reach at the given position
ahead.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......
......@@ -51,13 +51,23 @@
<dl class="function">
<dt id="high_level_policy_main.high_level_policy_training">
<code class="descclassname">high_level_policy_main.</code><code class="descname">high_level_policy_training</code><span class="sig-paren">(</span><em>nb_steps=25000</em>, <em>load_weights=False</em>, <em>training=True</em>, <em>testing=True</em>, <em>nb_episodes_for_test=10</em>, <em>max_nb_steps=100</em>, <em>visualize=False</em>, <em>tensorboard=False</em>, <em>save_path='highlevel_weights.h5f'</em><span class="sig-paren">)</span><a class="headerlink" href="#high_level_policy_main.high_level_policy_training" title="Permalink to this definition"></a></dt>
<dd><p>Do RL of the high-level policy and test it.
:param nb_steps: the number of steps to perform RL
:param load_weights: True if the pre-learned NN weights are loaded (for initializations of NNs)
:param training: True to enable training
:param testing: True to enable testing
:param nb_episodes_for_test: the number of episodes for testing</p>
<code class="descclassname">high_level_policy_main.</code><code class="descname">high_level_policy_training</code><span class="sig-paren">(</span><em>nb_steps=25000</em>, <em>load_weights=False</em>, <em>training=True</em>, <em>testing=True</em>, <em>nb_episodes_for_test=20</em>, <em>max_nb_steps=100</em>, <em>visualize=False</em>, <em>tensorboard=False</em>, <em>save_path='highlevel_weights.h5f'</em><span class="sig-paren">)</span><a class="headerlink" href="#high_level_policy_main.high_level_policy_training" title="Permalink to this definition"></a></dt>
<dd><p>Do RL of the high-level policy and test it.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>nb_steps</strong> – the number of steps to perform RL</li>
<li><strong>load_weights</strong> – True if the pre-learned NN weights are loaded (for initializations of NNs)</li>
<li><strong>training</strong> – True to enable training</li>
<li><strong>testing</strong> – True to enable testing</li>
<li><strong>nb_episodes_for_test</strong> – the number of episodes for testing</li>
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>
</div>
......
......@@ -34,23 +34,55 @@
<div class="section" id="module-low_level_policy_main">
<span id="low-level-policy-main-module"></span><h1>low_level_policy_main module<a class="headerlink" href="#module-low_level_policy_main" title="Permalink to this headline"></a></h1>
<dl class="class">
<dt id="low_level_policy_main.ManeuverEvaluateCallback">
<em class="property">class </em><code class="descclassname">low_level_policy_main.</code><code class="descname">ManeuverEvaluateCallback</code><span class="sig-paren">(</span><em>maneuver</em><span class="sig-paren">)</span><a class="headerlink" href="#low_level_policy_main.ManeuverEvaluateCallback" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">rl.callbacks.Callback</span></code></p>
<dl class="method">
<dt id="low_level_policy_main.ManeuverEvaluateCallback.on_episode_end">
<code class="descname">on_episode_end</code><span class="sig-paren">(</span><em>episode</em>, <em>logs={}</em><span class="sig-paren">)</span><a class="headerlink" href="#low_level_policy_main.ManeuverEvaluateCallback.on_episode_end" title="Permalink to this definition"></a></dt>
<dd><p>Called at end of each episode</p>
</dd></dl>
<dl class="method">
<dt id="low_level_policy_main.ManeuverEvaluateCallback.on_train_end">
<code class="descname">on_train_end</code><span class="sig-paren">(</span><em>logs=None</em><span class="sig-paren">)</span><a class="headerlink" href="#low_level_policy_main.ManeuverEvaluateCallback.on_train_end" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</dd></dl>
<dl class="function">
<dt id="low_level_policy_main.evaluate_low_level_policy">
<code class="descclassname">low_level_policy_main.</code><code class="descname">evaluate_low_level_policy</code><span class="sig-paren">(</span><em>maneuver</em>, <em>pretrained=False</em>, <em>nb_episodes_for_eval=100</em><span class="sig-paren">)</span><a class="headerlink" href="#low_level_policy_main.evaluate_low_level_policy" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="function">
<dt id="low_level_policy_main.low_level_policy_testing">
<code class="descclassname">low_level_policy_main.</code><code class="descname">low_level_policy_testing</code><span class="sig-paren">(</span><em>maneuver</em>, <em>pretrained=False</em>, <em>nb_episodes_for_test=20</em><span class="sig-paren">)</span><a class="headerlink" href="#low_level_policy_main.low_level_policy_testing" title="Permalink to this definition"></a></dt>
<code class="descclassname">low_level_policy_main.</code><code class="descname">low_level_policy_testing</code><span class="sig-paren">(</span><em>maneuver</em>, <em>pretrained=False</em>, <em>visualize=True</em>, <em>nb_episodes_for_test=20</em><span class="sig-paren">)</span><a class="headerlink" href="#low_level_policy_main.low_level_policy_testing" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="function">
<dt id="low_level_policy_main.low_level_policy_training">
<code class="descclassname">low_level_policy_main.</code><code class="descname">low_level_policy_training</code><span class="sig-paren">(</span><em>maneuver</em>, <em>nb_steps</em>, <em>RL_method='DDPG'</em>, <em>load_weights=False</em>, <em>training=True</em>, <em>testing=True</em>, <em>visualize=False</em>, <em>nb_episodes_for_test=10</em>, <em>tensorboard=False</em><span class="sig-paren">)</span><a class="headerlink" href="#low_level_policy_main.low_level_policy_training" title="Permalink to this definition"></a></dt>
<dd><p>Do RL of the low-level policy of the given maneuver and test it.
:param maneuver: the name of the maneuver defined in config.json (e.g., ‘default’).
:param nb_steps: the number of steps to perform RL.
:param RL_method: either DDPG or PPO2.
:param load_weights: True if the pre-learned NN weights are loaded (for initializations of NNs).
:param training: True to enable training.
:param testing: True to enable testing.
:param visualize: True to see the graphical outputs during training.
:param nb_episodes_for_test: the number of episodes for testing.</p>
<code class="descclassname">low_level_policy_main.</code><code class="descname">low_level_policy_training</code><span class="sig-paren">(</span><em>maneuver</em>, <em>nb_steps</em>, <em>RL_method='DDPG'</em>, <em>load_weights=False</em>, <em>training=True</em>, <em>testing=True</em>, <em>visualize=False</em>, <em>nb_episodes_for_test=10</em>, <em>tensorboard=False</em>, <em>without_ltl=False</em><span class="sig-paren">)</span><a class="headerlink" href="#low_level_policy_main.low_level_policy_training" title="Permalink to this definition"></a></dt>
<dd><p>Do RL of the low-level policy of the given maneuver and test it.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>maneuver</strong> – the name of the maneuver defined in config.json (e.g., ‘default’).</li>
<li><strong>nb_steps</strong> – the number of steps to perform RL.</li>
<li><strong>RL_method</strong> – either DDPG or PPO2.</li>
<li><strong>load_weights</strong> – True if the pre-learned NN weights are loaded (for initializations of NNs).</li>
<li><strong>training</strong> – True to enable training.</li>
<li><strong>testing</strong> – True to enable testing.</li>
<li><strong>visualize</strong> – True to see the graphical outputs during training.</li>
<li><strong>nb_episodes_for_test</strong> – the number of episodes for testing.</li>
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>
</div>
......
......@@ -50,39 +50,28 @@
</dd></dl>
<dl class="function">
<dt id="mcts.evaluate_online_mcts">
<code class="descclassname">mcts.</code><code class="descname">evaluate_online_mcts</code><span class="sig-paren">(</span><em>nb_episodes=20</em>, <em>nb_trials=5</em><span class="sig-paren">)</span><a class="headerlink" href="#mcts.evaluate_online_mcts" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="function">
<dt id="mcts.mcts_evaluation">
<code class="descclassname">mcts.</code><code class="descname">mcts_evaluation</code><span class="sig-paren">(</span><em>nb_traversals</em>, <em>num_trials=5</em>, <em>visualize=False</em><span class="sig-paren">)</span><a class="headerlink" href="#mcts.mcts_evaluation" title="Permalink to this definition"></a></dt>
<dd><p>Do RL of the low-level policy of the given maneuver and test it.
:param nb_traversals: number of MCTS traversals
:param save_every: save at every these many traversals
:param visualize: visualization / rendering</p>
</dd></dl>
<dl class="function">
<dt id="mcts.mcts_training">
<code class="descclassname">mcts.</code><code class="descname">mcts_training</code><span class="sig-paren">(</span><em>nb_traversals</em>, <em>save_every=20</em>, <em>visualize=False</em><span class="sig-paren">)</span><a class="headerlink" href="#mcts.mcts_training" title="Permalink to this definition"></a></dt>
<dd><p>Do RL of the low-level policy of the given maneuver and test it.
:param nb_traversals: number of MCTS traversals
:param save_every: save at every these many traversals
:param visualize: visualization / rendering</p>
<code class="descclassname">mcts.</code><code class="descname">mcts_evaluation</code><span class="sig-paren">(</span><em>depth</em>, <em>nb_traversals</em>, <em>nb_episodes</em>, <em>nb_trials</em>, <em>visualize=False</em>, <em>debug=False</em>, <em>pretrained=True</em>, <em>highlevel_policy_file='highlevel_weights.h5f'</em><span class="sig-paren">)</span><a class="headerlink" href="#mcts.mcts_evaluation" title="Permalink to this definition"></a></dt>
<dd><p>Do RL of the low-level policy of the given maneuver and test it.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>depth</strong> – depth of each tree search</li>
<li><strong>nb_traversals</strong> – number of MCTS traversals per episodes</li>
<li><strong>nb_episodes</strong> – number of episodes per trial</li>
<li><strong>nb_trials</strong> – number of trials</li>
<li><strong>visualize</strong> – visualization / rendering</li>
<li><strong>debug</strong> – whether or not to show debug information</li>
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt id="mcts.mcts_visualize">
<code class="descclassname">mcts.</code><code class="descname">mcts_visualize</code><span class="sig-paren">(</span><em>file_name</em><span class="sig-paren">)</span><a class="headerlink" href="#mcts.mcts_visualize" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="function">
<dt id="mcts.online_mcts">
<code class="descclassname">mcts.</code><code class="descname">online_mcts</code><span class="sig-paren">(</span><em>nb_episodes=10</em><span class="sig-paren">)</span><a class="headerlink" href="#mcts.online_mcts" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
......
......@@ -6,7 +6,7 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>wisemove &#8212; WiseMove documentation</title>
<title>wise-move-dev &#8212; WiseMove documentation</title>
<link rel="stylesheet" href="../_static/haiku.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/fonts.css" type="text/css" />
......@@ -20,7 +20,7 @@
</head><body>
<div class="header" role="banner"><h1 class="heading"><a href="../index.html">
<span>WiseMove documentation</span></a></h1>
<h2 class="heading"><span>wisemove</span></h2>
<h2 class="heading"><span>wise-move-dev</span></h2>
</div>
<div class="topnav" role="navigation" aria-label="top navigation">
......@@ -32,8 +32,8 @@
<div class="content">
<div class="section" id="wisemove">
<h1>wisemove<a class="headerlink" href="#wisemove" title="Permalink to this headline"></a></h1>
<div class="section" id="wise-move-dev">
<h1>wise-move-dev<a class="headerlink" href="#wise-move-dev" title="Permalink to this headline"></a></h1>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="backends.html">backends package</a><ul>
......@@ -43,8 +43,8 @@
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends.kerasrl_learner">backends.kerasrl_learner module</a></li>
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends.learner_base">backends.learner_base module</a></li>
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends.manual_policy">backends.manual_policy module</a></li>
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends.mcts_controller">backends.mcts_controller module</a></li>
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends.mcts_learner">backends.mcts_learner module</a></li>
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends.online_mcts_controller">backends.online_mcts_controller module</a></li>
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends.policy_base">backends.policy_base module</a></li>
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends.rl_controller">backends.rl_controller module</a></li>
<li class="toctree-l2"><a class="reference internal" href="backends.html#module-backends">Module contents</a></li>
......@@ -77,33 +77,12 @@
<li class="toctree-l1"><a class="reference internal" href="high_level_policy_main.html">high_level_policy_main module</a></li>
<li class="toctree-l1"><a class="reference internal" href="low_level_policy_main.html">low_level_policy_main module</a></li>
<li class="toctree-l1"><a class="reference internal" href="mcts.html">mcts module</a></li>
<li class="toctree-l1"><a class="reference internal" href="model_checker.html">model_checker package</a><ul>
<li class="toctree-l2"><a class="reference internal" href="model_checker.html#subpackages">Subpackages</a><ul>
<li class="toctree-l3"><a class="reference internal" href="model_checker.simple_intersection.html">model_checker.simple_intersection package</a><ul>
<li class="toctree-l4"><a class="reference internal" href="model_checker.simple_intersection.html#submodules">Submodules</a></li>
<li class="toctree-l4"><a class="reference internal" href="model_checker.simple_intersection.html#module-model_checker.simple_intersection.AP_dict">model_checker.simple_intersection.AP_dict module</a></li>
<li class="toctree-l4"><a class="reference internal" href="model_checker.simple_intersection.html#model-checker-simple-intersection-ltl-test-module">model_checker.simple_intersection.LTL_test module</a></li>
<li class="toctree-l4"><a class="reference internal" href="model_checker.simple_intersection.html#module-model_checker.simple_intersection.classes">model_checker.simple_intersection.classes module</a></li>
<li class="toctree-l4"><a class="reference internal" href="model_checker.simple_intersection.html#module-model_checker.simple_intersection">Module contents</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="model_checker.html#submodules">Submodules</a></li>
<li class="toctree-l2"><a class="reference internal" href="model_checker.html#module-model_checker.LTL_property_base">model_checker.LTL_property_base module</a></li>
<li class="toctree-l2"><a class="reference internal" href="model_checker.html#module-model_checker.atomic_propositions_base">model_checker.atomic_propositions_base module</a></li>
<li class="toctree-l2"><a class="reference internal" href="model_checker.html#module-model_checker.parser">model_checker.parser module</a></li>
<li class="toctree-l2"><a class="reference internal" href="model_checker.html#module-model_checker.scanner">model_checker.scanner module</a></li>
<li class="toctree-l2"><a class="reference internal" href="model_checker.html#module-model_checker">Module contents</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="options.html">options package</a><ul>
<li class="toctree-l2"><a class="reference internal" href="options.html#subpackages">Subpackages</a><ul>
<li class="toctree-l3"><a class="reference internal" href="options.simple_intersection.html">options.simple_intersection package</a><ul>
<li class="toctree-l4"><a class="reference internal" href="options.simple_intersection.html#submodules">Submodules</a></li>
<li class="toctree-l4"><a class="reference internal" href="options.simple_intersection.html#module-options.simple_intersection.maneuver_base">options.simple_intersection.maneuver_base module</a></li>
<li class="toctree-l4"><a class="reference internal" href="options.simple_intersection.html#module-options.simple_intersection.maneuvers">options.simple_intersection.maneuvers module</a></li>
<li class="toctree-l4"><a class="reference internal" href="options.simple_intersection.html#module-options.simple_intersection.mcts_maneuvers">options.simple_intersection.mcts_maneuvers module</a></li>
<li class="toctree-l4"><a class="reference internal" href="options.simple_intersection.html#module-options.simple_intersection">Module contents</a></li>
</ul>
</li>
......@@ -115,6 +94,26 @@
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="ppo2_training.html">ppo2_training module</a></li>
<li class="toctree-l1"><a class="reference internal" href="verifier.html">verifier package</a><ul>
<li class="toctree-l2"><a class="reference internal" href="verifier.html#subpackages">Subpackages</a><ul>
<li class="toctree-l3"><a class="reference internal" href="verifier.simple_intersection.html">verifier.simple_intersection package</a><ul>
<li class="toctree-l4"><a class="reference internal" href="verifier.simple_intersection.html#submodules">Submodules</a></li>
<li class="toctree-l4"><a class="reference internal" href="verifier.simple_intersection.html#module-verifier.simple_intersection.AP_dict">verifier.simple_intersection.AP_dict module</a></li>
<li class="toctree-l4"><a class="reference internal" href="verifier.simple_intersection.html#module-verifier.simple_intersection.LTL_test">verifier.simple_intersection.LTL_test module</a></li>
<li class="toctree-l4"><a class="reference internal" href="verifier.simple_intersection.html#module-verifier.simple_intersection.classes">verifier.simple_intersection.classes module</a></li>
<li class="toctree-l4"><a class="reference internal" href="verifier.simple_intersection.html#module-verifier.simple_intersection">Module contents</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="verifier.html#submodules">Submodules</a></li>
<li class="toctree-l2"><a class="reference internal" href="verifier.html#module-verifier.LTL_property_base">verifier.LTL_property_base module</a></li>
<li class="toctree-l2"><a class="reference internal" href="verifier.html#module-verifier.atomic_propositions_base">verifier.atomic_propositions_base module</a></li>
<li class="toctree-l2"><a class="reference internal" href="verifier.html#module-verifier.parser">verifier.parser module</a></li>
<li class="toctree-l2"><a class="reference internal" href="verifier.html#module-verifier.scanner">verifier.scanner module</a></li>
<li class="toctree-l2"><a class="reference internal" href="verifier.html#module-verifier">Module contents</a></li>
</ul>
</li>
</ul>
</div>
</div>
......
......@@ -42,7 +42,6 @@
<li class="toctree-l2"><a class="reference internal" href="options.simple_intersection.html#submodules">Submodules</a></li>
<li class="toctree-l2"><a class="reference internal" href="options.simple_intersection.html#module-options.simple_intersection.maneuver_base">options.simple_intersection.maneuver_base module</a></li>
<li class="toctree-l2"><a class="reference internal" href="options.simple_intersection.html#module-options.simple_intersection.maneuvers">options.simple_intersection.maneuvers module</a></li>
<li class="toctree-l2"><a class="reference internal" href="options.simple_intersection.html#module-options.simple_intersection.mcts_maneuvers">options.simple_intersection.mcts_maneuvers module</a></li>
<li class="toctree-l2"><a class="reference internal" href="options.simple_intersection.html#module-options.simple_intersection">Module contents</a></li>
</ul>
</li>
......@@ -89,8 +88,10 @@ is specified in a json file and consists of the following specific values:</p>
<dl class="method">
<dt id="options.options_loader.OptionsGraph.execute_controller_policy">
<code class="descname">execute_controller_policy</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#options.options_loader.OptionsGraph.execute_controller_policy" title="Permalink to this definition"></a></dt>
<dd><p>Performs low level steps and transitions to other nodes using controller transition policy.</p>
<p>Returns state after one step, step reward, episode_termination_flag, info</p>
<dd><p>Performs low level steps and transitions to other nodes using
controller transition policy.</p>
<p>Returns state after one step, step reward,
episode_termination_flag, info</p>
</dd></dl>
<dl class="method">
......@@ -126,7 +127,7 @@ controller to root node.</p>
<dl class="method">
<dt id="options.options_loader.OptionsGraph.set_controller_args">
<code class="descname">set_controller_args</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span><a class="headerlink" href="#options.options_loader.OptionsGraph.set_controller_args" title="Permalink to this definition"></a></dt>
<dd><p>Sets custom arguments depending on the chosen controller</p>
<dd><p>Sets custom arguments depending on the chosen controller.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......@@ -140,8 +141,9 @@ controller to root node.</p>
<dl class="method">
<dt id="options.options_loader.OptionsGraph.set_controller_policy">
<code class="descname">set_controller_policy</code><span class="sig-paren">(</span><em>policy</em><span class="sig-paren">)</span><a class="headerlink" href="#options.options_loader.OptionsGraph.set_controller_policy" title="Permalink to this definition"></a></dt>
<dd><p>Sets the trained controller policy as a function which takes in feature vector and returns
an option index(int). By default, trained_policy is None</p>
<dd><p>Sets the trained controller policy as a function which takes in
feature vector and returns an option index(int). By default,
trained_policy is None.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......@@ -155,7 +157,8 @@ an option index(int). By default, trained_policy is None</p>
<dl class="method">
<dt id="options.options_loader.OptionsGraph.set_current_node">
<code class="descname">set_current_node</code><span class="sig-paren">(</span><em>node_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#options.options_loader.OptionsGraph.set_current_node" title="Permalink to this definition"></a></dt>
<dd><p>Sets the current node for controller. Used for training/testing a particular node</p>
<dd><p>Sets the current node for controller. Used for training/testing a
particular node.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......@@ -169,9 +172,9 @@ an option index(int). By default, trained_policy is None</p>
<dl class="method">
<dt id="options.options_loader.OptionsGraph.step">
<code class="descname">step</code><span class="sig-paren">(</span><em>option</em><span class="sig-paren">)</span><a class="headerlink" href="#options.options_loader.OptionsGraph.step" title="Permalink to this definition"></a></dt>
<dd><p>Complete an episode using specified option. This assumes that
the manager’s env is at a place where the option’s initiation
condition is met.</p>
<dd><p>Complete an episode using specified option. This assumes that the
manager’s env is at a place where the option’s initiation condition is
met.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......
......@@ -44,18 +44,18 @@ backends.manual\_policy module
:undoc-members:
:show-inheritance:
backends.mcts\_learner module
-----------------------------
backends.mcts\_controller module