Commit 6a7cd92a authored by Aravind Balakrishnan's avatar Aravind Balakrishnan

Misc updates to docs and scripts

* Updated readme
* Updated Sphinx docs
* Updated scripts
* Built latest docs
* Updated low level policies
parent 7a31ba33
WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and model-checking using temporal logic constraints.
WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and safety verification using temporal logic constraints.
Requirements
============
......@@ -13,9 +13,9 @@ Installation
* Run the install dependencies script: `./scripts/install_dependencies.sh` to install pip3 and required python packages.
Note: The script checks if dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder,
else will install required packages from the internet. If you do not have an internet connection and the dependencies folder does not exist,
you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder.
Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, else will install required packages from the internet.
If you do not have an internet connection and the dependencies folder does not exist, you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder.
Documentation
=============
......@@ -25,28 +25,65 @@ Documentation
Replicate Results
=================
These are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
* Run `./scripts/install_dependencies.sh` to install python dependencies.
Given below are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation.
* Open terminal and navigate to the root of the project directory.
* Low-level policies:
* You can choose to train and test all the maneuvers. But this may take some time and is not recommended.
* To train all low-level policies from scratch: `python3 low_level_policy_main.py --train`. This may take some time.
* To test all these trained low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
* Make sure the training is fully complete before running above test.
* It is easier to verify few of the maneuvers using below commands:
* To train a single low-level, for example wait: `python3 low_level_policy_main.py --option=wait --train`.
* To test one of these trained low-level policies, for example wait: `python3 low_level_policy_main.py --option=wait --test --saved_policy_in_root`
* Available maneuvers are: wait, changelane, stop, keeplane, follow
* These results are visually evaluated.
* Note: This training has a high variance issue due to the continuous action space, especially for stop and keeplane maneuvers. It may help to train for 0.2 million steps than the default 0.1 million by adding argument '--nb_steps=200000' while training.
* Use `python3 low_level_policy_main.py --help` to see all available commands.
* You can choose to test the provided pre-trained options:
* To visually inspect all pre-trained options: `python3 low_level_policy_main.py --test`
* To evaluate all pre-trained options: `python3 low_level_policy_main.py --evaluate`
* To visually inspect a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --test`.
* To evaluate a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --evaluate`.
* Available options are: wait, changelane, stop, keeplane, follow
* Or, you can train and test all the options. But this may take some time. Newly trained policies are saved to the root folder by default.
* To train all low-level policies from scratch (~40 minutes): `python3 low_level_policy_main.py --train`.
* To visually inspect all these new low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`.
* To evaluate all these new low-level policies: `python3 low_level_policy_main.py --evaluate --saved_policy_in_root`.
* Make sure the training is fully complete before running above test/evaluation.
* It is faster to verify the training of a few options using below commands (**Recommended**):
* To train a single low-level, for example, *changelane* (~6 minutes): `python3 low_level_policy_main.py --option=changelane --train`. This is saved to the root folder.
* To evaluate one of these new low-level policies, for example *changelane*: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* Available options are: wait, changelane, stop, keeplane, follow
* **To replicate the experiments without additional properties:**
* Note that we have not provided a pre-trained policy that is trained without additional LTL.
* You will need to train it by adding the argument `--without_additional_ltl_properties` to the above *training* procedures. For example, `python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties`
* Now, use `--evaluate` to evaluate this new policy: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root`
* **The results of `--evaluate` here is one trial.** In the experiments reported in the paper, we conduct multiple such trials.
* High-level policy:
* To train high-level policy from scratch using the given low-level policies: `python3 high_level_policy_main.py --train`
* To evaluate this trained high-level policy: `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
* The success average and standard deviation corresponds to the result from high-level policy experiments.
* To run MCTS using the high-level policy:
* To obtain a probabilites tree and save it: `python3 mcts.py --train`
* To evaluate using this saved tree: `python3 mcts.py --evaluate --saved_policy_in_root`.
* The success average and standard deviation corresponds to the results from MCTS experiments.
* Use `python3 high_level_policy_main.py --help` to see all available commands.
* You can use the provided pre-trained high-level policy:
* To visually inspect this policy: `python3 high_level_policy_main.py --test`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate`
* Or, you can train the high-level policy from scratch (Note that this takes some time):
* To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): `python3 high_level_policy_main.py --train`
* To visually inspect this new policy: `python3 high_level_policy_main.py --test --saved_policy_in_root`
* To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`.
* Since above training takes a long time, you can instead verify using a lower number of steps:
* To train for 0.1 million steps (~25 minutes): `python3 high_level_policy_main.py --train --nb_steps=100000`
* Note that this has a much lower success rate of ~75%. So using this for MCTS will not give reported results.
* The success average and standard deviation in evaluation corresponds to the result from high-level policy experiments.
* MCTS:
* Use `python3 mcts.py --help` to see all available commands.
* You can run MCTS on the provided pre-trained high-level policy:
* To visually inspect MCTS on the pre-trained policy: `python3 mcts.py --test --nb_episodes=10`
* To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate`. Note that this takes a very long time (~16 hours).
* For a shorter version of the experiment: `python3 mcts.py --evaluate --nb_trials=2 --nb_episodes=10` (~20 minutes)
* Or, if you have trained a high-level policy from scratch, you can run MCTS on it:
* To visually inspect MCTS on the new policy: `python3 mcts.py --test --highlevel_policy_in_root --nb_episodes=10`
* To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate --highlevel_policy_in_root`. Note that this takes a very long time (~16 hours).
* For a shorter version of the experiment: `python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10` (~20 minutes)
* You can use the arguments `--depth` and `--nb_traversals` to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50).
* The success average and standard deviation in the evaluation corresponds to the results from MCTS experiments.
The time taken to execute above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs:
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
16GB memory
Nvidia GeForce GTX 1080 Ti
Ubuntu 16.04
Coding Standards
================
......
......@@ -206,11 +206,13 @@ class DDPGLearner(LearnerBase):
def test_model(self,
env,
nb_episodes=50,
callbacks=None,
visualize=True,
nb_max_episode_steps=200):
self.agent_model.test(
env,
nb_episodes=nb_episodes,
callbacks=callbacks,
visualize=visualize,
nb_max_episode_steps=nb_max_episode_steps)
......
......@@ -46,15 +46,8 @@
<dl class="method">
<dt id="backends.baselines_learner.PPO2Agent.create_agent">
<code class="descname">create_agent</code><span class="sig-paren">(</span><em>policy</em>, <em>tensorboard</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.baselines_learner.PPO2Agent.create_agent" title="Permalink to this definition"></a></dt>
<dd><p>Creates a PPO agent</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">stable_baselines PPO2 object</td>
</tr>
</tbody>
</table>
<dd><p>Creates a PPO agent.</p>
<p>Returns: stable_baselines PPO2 object</p>
</dd></dl>
<dl class="method">
......@@ -122,15 +115,15 @@
<dl class="method">
<dt id="backends.controller_base.ControllerBase.can_transition">
<code class="descname">can_transition</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.controller_base.ControllerBase.can_transition" title="Permalink to this definition"></a></dt>
<dd><p>Returns boolean signifying whether we can transition. To be
implemented in subclass.</p>
<dd><p>Returns boolean signifying whether we can transition.</p>
<p>To be implemented in subclass.</p>
</dd></dl>
<dl class="method">
<dt id="backends.controller_base.ControllerBase.do_transition">
<code class="descname">do_transition</code><span class="sig-paren">(</span><em>observation</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.controller_base.ControllerBase.do_transition" title="Permalink to this definition"></a></dt>
<dd><p>Do a transition, assuming we can transition. To be
implemented in subclass.</p>
<dd><p>Do a transition, assuming we can transition. To be implemented in
subclass.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......@@ -154,7 +147,7 @@ implemented in subclass.</p>
<dl class="method">
<dt id="backends.controller_base.ControllerBase.set_current_node">
<code class="descname">set_current_node</code><span class="sig-paren">(</span><em>node_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.controller_base.ControllerBase.set_current_node" title="Permalink to this definition"></a></dt>
<dd><p>Sets the current node which is being executed</p>
<dd><p>Sets the current node which is being executed.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
......@@ -281,7 +274,7 @@ current observation.</p>
<dl class="method">
<dt id="backends.kerasrl_learner.DDPGLearner.test_model">
<code class="descname">test_model</code><span class="sig-paren">(</span><em>env</em>, <em>nb_episodes=50</em>, <em>visualize=True</em>, <em>nb_max_episode_steps=200</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DDPGLearner.test_model" title="Permalink to this definition"></a></dt>
<code class="descname">test_model</code><span class="sig-paren">(</span><em>env</em>, <em>nb_episodes=50</em>, <em>callbacks=None</em>, <em>visualize=True</em>, <em>nb_max_episode_steps=200</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DDPGLearner.test_model" title="Permalink to this definition"></a></dt>
<dd><p>Test the agent on the environment.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
......@@ -347,11 +340,11 @@ If the policy is implemented by a neural network, this corresponds to a forward
<dl class="class">
<dt id="backends.kerasrl_learner.DQNLearner">
<em class="property">class </em><code class="descclassname">backends.kerasrl_learner.</code><code class="descname">DQNLearner</code><span class="sig-paren">(</span><em>input_shape=(48</em>, <em>)</em>, <em>nb_actions=5</em>, <em>low_level_policies=None</em>, <em>model=None</em>, <em>policy=None</em>, <em>memory=None</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner" title="Permalink to this definition"></a></dt>
<em class="property">class </em><code class="descclassname">backends.kerasrl_learner.</code><code class="descname">DQNLearner</code><span class="sig-paren">(</span><em>input_shape=(48</em>, <em>)</em>, <em>nb_actions=5</em>, <em>low_level_policies=None</em>, <em>model=None</em>, <em>policy=None</em>, <em>memory=None</em>, <em>test_policy=None</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#backends.learner_base.LearnerBase" title="backends.learner_base.LearnerBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">backends.learner_base.LearnerBase</span></code></a></p>
<dl class="method">
<dt id="backends.kerasrl_learner.DQNLearner.create_agent">
<code class="descname">create_agent</code><span class="sig-paren">(</span><em>model</em>, <em>policy</em>, <em>memory</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner.create_agent" title="Permalink to this definition"></a></dt>
<code class="descname">create_agent</code><span class="sig-paren">(</span><em>model</em>, <em>policy</em>, <em>memory</em>, <em>test_policy</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner.create_agent" title="Permalink to this definition"></a></dt>
<dd><p>Creates a KerasRL DDPGAgent with given components.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
......@@ -389,6 +382,11 @@ If the policy is implemented by a neural network, this corresponds to a forward
<code class="descname">get_default_policy</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner.get_default_policy" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="method">
<dt id="backends.kerasrl_learner.DQNLearner.get_default_test_policy">
<code class="descname">get_default_test_policy</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner.get_default_test_policy" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="method">
<dt id="backends.kerasrl_learner.DQNLearner.get_q_value">
<code class="descname">get_q_value</code><span class="sig-paren">(</span><em>observation</em>, <em>action</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner.get_q_value" title="Permalink to this definition"></a></dt>
......@@ -474,7 +472,7 @@ current observation.</p>
<dl class="method">
<dt id="backends.kerasrl_learner.DQNLearner.train">
<code class="descname">train</code><span class="sig-paren">(</span><em>env</em>, <em>nb_steps=1000000</em>, <em>visualize=False</em>, <em>nb_max_episode_steps=200</em>, <em>tensorboard=False</em>, <em>model_checkpoints=False</em>, <em>checkpoint_interval=10000</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner.train" title="Permalink to this definition"></a></dt>
<code class="descname">train</code><span class="sig-paren">(</span><em>env</em>, <em>nb_steps=1000000</em>, <em>visualize=False</em>, <em>verbose=1</em>, <em>log_interval=10000</em>, <em>nb_max_episode_steps=200</em>, <em>tensorboard=False</em>, <em>model_checkpoints=False</em>, <em>checkpoint_interval=10000</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.DQNLearner.train" title="Permalink to this definition"></a></dt>
<dd><p>Train the learning agent on the environment.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
......@@ -494,6 +492,48 @@ current observation.</p>
</dd></dl>
<dl class="class">
<dt id="backends.kerasrl_learner.RestrictedEpsGreedyQPolicy">
<em class="property">class </em><code class="descclassname">backends.kerasrl_learner.</code><code class="descname">RestrictedEpsGreedyQPolicy</code><span class="sig-paren">(</span><em>eps=0.1</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.RestrictedEpsGreedyQPolicy" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">rl.policy.EpsGreedyQPolicy</span></code></p>
<p>Implement the epsilon greedy policy</p>
<p>Restricted Eps Greedy policy.
This policy ensures that it never chooses the action whose value is -inf</p>
<dl class="method">
<dt id="backends.kerasrl_learner.RestrictedEpsGreedyQPolicy.select_action">
<code class="descname">select_action</code><span class="sig-paren">(</span><em>q_values</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.RestrictedEpsGreedyQPolicy.select_action" title="Permalink to this definition"></a></dt>
<dd><p>Return the selected action</p>
<dl class="docutils">
<dt># Arguments</dt>
<dd>q_values (np.ndarray): List of the estimations of Q for each action</dd>
<dt># Returns</dt>
<dd>Selection action</dd>
</dl>
</dd></dl>
</dd></dl>
<dl class="class">
<dt id="backends.kerasrl_learner.RestrictedGreedyQPolicy">
<em class="property">class </em><code class="descclassname">backends.kerasrl_learner.</code><code class="descname">RestrictedGreedyQPolicy</code><a class="headerlink" href="#backends.kerasrl_learner.RestrictedGreedyQPolicy" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">rl.policy.GreedyQPolicy</span></code></p>
<p>Implement the epsilon greedy policy</p>
<p>Restricted Greedy policy.
This policy ensures that it never chooses the action whose value is -inf</p>
<dl class="method">
<dt id="backends.kerasrl_learner.RestrictedGreedyQPolicy.select_action">
<code class="descname">select_action</code><span class="sig-paren">(</span><em>q_values</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.kerasrl_learner.RestrictedGreedyQPolicy.select_action" title="Permalink to this definition"></a></dt>
<dd><p>Return the selected action</p>
<dl class="docutils">
<dt># Arguments</dt>
<dd>q_values (np.ndarray): List of the estimations of Q for each action</dd>
<dt># Returns</dt>
<dd>Selection action</dd>
</dl>
</dd></dl>
</dd></dl>
</div>
<div class="section" id="module-backends.learner_base">
<span id="backends-learner-base-module"></span><h2>backends.learner_base module<a class="headerlink" href="#module-backends.learner_base" title="Permalink to this headline"></a></h2>
......@@ -625,171 +665,283 @@ current observation.</p>
</dd></dl>
</div>
<div class="section" id="module-backends.mcts_learner">
<span id="backends-mcts-learner-module"></span><h2>backends.mcts_learner module<a class="headerlink" href="#module-backends.mcts_learner" title="Permalink to this headline"></a></h2>
<div class="section" id="module-backends.mcts_controller">
<span id="backends-mcts-controller-module"></span><h2>backends.mcts_controller module<a class="headerlink" href="#module-backends.mcts_controller" title="Permalink to this headline"></a></h2>
<dl class="class">
<dt id="backends.mcts_learner.MCTSLearner">
<em class="property">class </em><code class="descclassname">backends.mcts_learner.</code><code class="descname">MCTSLearner</code><span class="sig-paren">(</span><em>env</em>, <em>low_level_policies</em>, <em>start_node_alias</em>, <em>max_depth=10</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner" title="Permalink to this definition"></a></dt>
<dt id="backends.mcts_controller.MCTSController">
<em class="property">class </em><code class="descclassname">backends.mcts_controller.</code><code class="descname">MCTSController</code><span class="sig-paren">(</span><em>env</em>, <em>low_level_policies</em>, <em>start_node_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_controller.MCTSController" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#backends.controller_base.ControllerBase" title="backends.controller_base.ControllerBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">backends.controller_base.ControllerBase</span></code></a></p>
<p>Monte Carlo Tree Search implementation using the UCB1 and
progressive widening approach as explained in Paxton et al (2017).</p>
<dl class="attribute">
<dt id="backends.mcts_learner.MCTSLearner.M">
<code class="descname">M</code><em class="property"> = None</em><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.M" title="Permalink to this definition"></a></dt>
<dd><p>visitation count of discrete observation with option</p>
<p>MCTS Controller.</p>
<dl class="method">
<dt id="backends.mcts_controller.MCTSController.can_transition">
<code class="descname">can_transition</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_controller.MCTSController.can_transition" title="Permalink to this definition"></a></dt>
<dd><p>Returns boolean signifying whether we can transition.</p>
<p>To be implemented in subclass.</p>
</dd></dl>
<dl class="attribute">
<dt id="backends.mcts_learner.MCTSLearner.N">
<code class="descname">N</code><em class="property"> = None</em><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.N" title="Permalink to this definition"></a></dt>
<dd><p>visitation count of discrete observations</p>
<dl class="method">
<dt id="backends.mcts_controller.MCTSController.change_low_level_references">
<code class="descname">change_low_level_references</code><span class="sig-paren">(</span><em>env_copy</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_controller.MCTSController.change_low_level_references" title="Permalink to this definition"></a></dt>
<dd><p>Change references in low level policies by updating the environment
with the copy of the environment.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>env_copy</strong> – reference to copy of the environment</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="attribute">
<dt id="backends.mcts_learner.MCTSLearner.TR">
<code class="descname">TR</code><em class="property"> = None</em><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.TR" title="Permalink to this definition"></a></dt>
<dd><p>total reward from given discrete observation with option</p>
<dl class="method">
<dt id="backends.mcts_controller.MCTSController.check_env">
<code class="descname">check_env</code><span class="sig-paren">(</span><em>x</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_controller.MCTSController.check_env" title="Permalink to this definition"></a></dt>
<dd><p>Prints the object id of the environment. Debugging function.</p>
</dd></dl>
<dl class="attribute">
<dt id="backends.mcts_learner.MCTSLearner.adj">
<code class="descname">adj</code><em class="property"> = None</em><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.adj" title="Permalink to this definition"></a></dt>
<dd><p>adjacency list</p>
<dl class="method">
<dt id="backends.mcts_controller.MCTSController.do_transition">
<code class="descname">do_transition</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_controller.MCTSController.do_transition" title="Permalink to this definition"></a></dt>
<dd><p>Do a transition, assuming we can transition. To be implemented in
subclass.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>observation</strong> – final observation from episodic step</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="attribute">
<dt id="backends.mcts_learner.MCTSLearner.curr_node_alias">
<code class="descname">curr_node_alias</code><em class="property"> = None</em><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.curr_node_alias" title="Permalink to this definition"></a></dt>
<dd><p>store current node alias</p>
<dl class="method">
<dt id="backends.mcts_controller.MCTSController.set_current_node">
<code class="descname">set_current_node</code><span class="sig-paren">(</span><em>node_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_controller.MCTSController.set_current_node" title="Permalink to this definition"></a></dt>
<dd><p>Sets the current node which is being executed.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>node</strong> – node alias of the node to be set</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="attribute">
<dt id="backends.mcts_learner.MCTSLearner.curr_node_num">
<code class="descname">curr_node_num</code><em class="property"> = None</em><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.curr_node_num" title="Permalink to this definition"></a></dt>
<dd><p>store current node’s id</p>
</dd></dl>
</div>
<div class="section" id="module-backends.mcts_learner">
<span id="backends-mcts-learner-module"></span><h2>backends.mcts_learner module<a class="headerlink" href="#module-backends.mcts_learner" title="Permalink to this headline"></a></h2>
<dl class="class">
<dt id="backends.mcts_learner.MCTSLearner">
<em class="property">class </em><code class="descclassname">backends.mcts_learner.</code><code class="descname">MCTSLearner</code><span class="sig-paren">(</span><em>env</em>, <em>low_level_policies</em>, <em>max_depth=10</em>, <em>debug=False</em>, <em>rollout_timeout=500</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#backends.controller_base.ControllerBase" title="backends.controller_base.ControllerBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">backends.controller_base.ControllerBase</span></code></a></p>
<p>MCTS Logic.</p>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.do_transition">
<code class="descname">do_transition</code><span class="sig-paren">(</span><em>observation</em>, <em>visualize=False</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.do_transition" title="Permalink to this definition"></a></dt>
<dd><p>Do a transition using UCB metric, with the latest observation
from the episodic step.</p>
<dt id="backends.mcts_learner.MCTSLearner.backup">
<code class="descname">backup</code><span class="sig-paren">(</span><em>rollout_reward</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.backup" title="Permalink to this definition"></a></dt>
<dd><p>Reward backup strategy.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>rollout_reward</strong> – reward to back up</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.best_action">
<code class="descname">best_action</code><span class="sig-paren">(</span><em>node</em>, <em>C</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.best_action" title="Permalink to this definition"></a></dt>
<dd><p>Find the best option to execute from a given node. The constant
C determines the coefficient of the uncertainty estimate.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>observation</strong> – final observation from episodic step</li>
<li><strong>visualize</strong> – whether or not to visualize low level steps</li>
<li><strong>node</strong> – node from which the best action needs to be found</li>
<li><strong>C</strong> – coefficient of uncertainty term</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Returns o_star using UCB metric</p>
<p>Returns the best possible option alias from given node.</p>
</dd></dl>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.get_best_node">
<code class="descname">get_best_node</code><span class="sig-paren">(</span><em>observation</em>, <em>use_ucb=False</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.get_best_node" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dt id="backends.mcts_learner.MCTSLearner.def_policy">
<code class="descname">def_policy</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.def_policy" title="Permalink to this definition"></a></dt>
<dd><p>Default policy, used for rollouts.</p>
</dd></dl>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.load_model">
<code class="descname">load_model</code><span class="sig-paren">(</span><em>file_name='mcts.pickle'</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.load_model" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dt id="backends.mcts_learner.MCTSLearner.expand">
<code class="descname">expand</code><span class="sig-paren">(</span><em>node</em>, <em>not_visited</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.expand" title="Permalink to this definition"></a></dt>
<dd><p>Create a new node from the given node. Chooses an option from the
not_visited list. Also moves to the newly created node.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>node</strong> – node to expand from</li>
<li><strong>not_visited</strong> – new possible edges or options</li>
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.move">
<code class="descname">move</code><span class="sig-paren">(</span><em>option_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.move" title="Permalink to this definition"></a></dt>
<dd><p>Move in the MCTS tree. This means moving in the tree, updating
state information and stepping through option_alias.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>option_alias</strong> – edge or option to execute to reach a next node</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="attribute">
<dt id="backends.mcts_learner.MCTSLearner.nodes">
<code class="descname">nodes</code><em class="property"> = None</em><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.nodes" title="Permalink to this definition"></a></dt>
<dd><p>node properties</p>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.option_step">
<code class="descname">option_step</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.option_step" title="Permalink to this definition"></a></dt>
<dd><p>Step through the current option_alias.</p>
</dd></dl>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.save_model">
<code class="descname">save_model</code><span class="sig-paren">(</span><em>file_name='mcts.pickle'</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.save_model" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dt id="backends.mcts_learner.MCTSLearner.reset">
<code class="descname">reset</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.reset" title="Permalink to this definition"></a></dt>
<dd><p>Resets maneuvers and sets current node to root.</p>
</dd></dl>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.set_current_node">
<code class="descname">set_current_node</code><span class="sig-paren">(</span><em>new_node_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.set_current_node" title="Permalink to this definition"></a></dt>
<dd><p>Sets the current node which is being executed</p>
<dt id="backends.mcts_learner.MCTSLearner.search">
<code class="descname">search</code><span class="sig-paren">(</span><em>obs</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.search" title="Permalink to this definition"></a></dt>
<dd><p>Perform a traversal from the root node.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>node</strong> – node alias of the node to be set</td>
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>obs</strong> – current observation</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.traverse">
<code class="descname">traverse</code><span class="sig-paren">(</span><em>observation</em>, <em>visualize=False</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.traverse" title="Permalink to this definition"></a></dt>
<dd><p>Do a complete traversal from root to leaf. Assumes the
environment is reset and we are at the root node.</p>
<dt id="backends.mcts_learner.MCTSLearner.set_current_node">
<code class="descname">set_current_node</code><span class="sig-paren">(</span><em>node_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.set_current_node" title="Permalink to this definition"></a></dt>
<dd><p>Set current node so that option_step can execute it later.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>observation</strong> – observation from the environment</li>
<li><strong>visualize</strong> – whether or not to visualize low level steps</li>
</ul>
</td>
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>node_alias</strong> – option alias to execute next</td>
</tr>
</tbody>
</table>
<p>Returns value of root node</p>
</dd></dl>
<dl class="method">
<dt id="backends.mcts_learner.MCTSLearner.tree_policy">
<code class="descname">tree_policy</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.MCTSLearner.tree_policy" title="Permalink to this definition"></a></dt>
<dd><p>Policy that determines how to move through the MCTS tree.
Terminates either when environment reaches a terminal state, or we
reach a leaf node.</p>
</dd></dl>
</dd></dl>
</div>
<div class="section" id="module-backends.online_mcts_controller">
<span id="backends-online-mcts-controller-module"></span><h2>backends.online_mcts_controller module<a class="headerlink" href="#module-backends.online_mcts_controller" title="Permalink to this headline"></a></h2>
<dl class="class">
<dt id="backends.online_mcts_controller.OnlineMCTSController">
<em class="property">class </em><code class="descclassname">backends.online_mcts_controller.</code><code class="descname">OnlineMCTSController</code><span class="sig-paren">(</span><em>env</em>, <em>low_level_policies</em>, <em>start_node_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.online_mcts_controller.OnlineMCTSController" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#backends.controller_base.ControllerBase" title="backends.controller_base.ControllerBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">backends.controller_base.ControllerBase</span></code></a></p>
<p>Online MCTS</p>
<dt id="backends.mcts_learner.Node">
<em class="property">class </em><code class="descclassname">backends.mcts_learner.</code><code class="descname">Node</code><span class="sig-paren">(</span><em>node_num</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.Node" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
<p>Represents a node in a tree.</p>
<dl class="method">
<dt id="backends.online_mcts_controller.OnlineMCTSController.can_transition">
<code class="descname">can_transition</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.online_mcts_controller.OnlineMCTSController.can_transition" title="Permalink to this definition"></a></dt>
<dd><p>Returns boolean signifying whether we can transition. To be
implemented in subclass.</p>
<dt id="backends.mcts_learner.Node.is_terminal">
<code class="descname">is_terminal</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.Node.is_terminal" title="Permalink to this definition"></a></dt>
<dd><p>Check whether this node is a leaf node or not.</p>
</dd></dl>
</dd></dl>
<dl class="class">
<dt id="backends.mcts_learner.Tree">
<em class="property">class </em><code class="descclassname">backends.mcts_learner.</code><code class="descname">Tree</code><span class="sig-paren">(</span><em>max_depth</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.Tree" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
<p>Tree representation used for MCTS.</p>
<dl class="method">
<dt id="backends.online_mcts_controller.OnlineMCTSController.change_low_level_references">
<code class="descname">change_low_level_references</code><span class="sig-paren">(</span><em>env_copy</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.online_mcts_controller.OnlineMCTSController.change_low_level_references" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dt id="backends.mcts_learner.Tree.add_state">
<code class="descname">add_state</code><span class="sig-paren">(</span><em>obs</em>, <em>dis_obs</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.Tree.add_state" title="Permalink to this definition"></a></dt>
<dd><p>Associates observation and discrete observation to the current
node. Useful to keep track of the last known observation(s).</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>obs</strong> – observation to save to current node’s cstate</li>
<li><strong>dis_obs</strong> – observation to save to current node’s state</li>
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="backends.online_mcts_controller.OnlineMCTSController.do_transition">
<code class="descname">do_transition</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#backends.online_mcts_controller.OnlineMCTSController.do_transition" title="Permalink to this definition"></a></dt>
<dd><p>Do a transition, assuming we can transition. To be
implemented in subclass.</p>
<dt id="backends.mcts_learner.Tree.move">
<code class="descname">move</code><span class="sig-paren">(</span><em>option_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.Tree.move" title="Permalink to this definition"></a></dt>
<dd><p>Use the edge option_alias and move from current node to
a next node.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>observation</strong> – final observation from episodic step</td>
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>option_alias</strong> – edge to move along</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="backends.online_mcts_controller.OnlineMCTSController.set_current_node">
<code class="descname">set_current_node</code><span class="sig-paren">(</span><em>node_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.online_mcts_controller.OnlineMCTSController.set_current_node" title="Permalink to this definition"></a></dt>
<dd><p>Sets the current node which is being executed</p>
<dt id="backends.mcts_learner.Tree.new_node">
<code class="descname">new_node</code><span class="sig-paren">(</span><em>option_alias</em><span class="sig-paren">)</span><a class="headerlink" href="#backends.mcts_learner.Tree.new_node" title="Permalink to this definition"></a></dt>
<dd><p>Creates a new node that is a child of the current node.
The new node can be reached by the option_alias edge from
the current node.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>node</strong> – node alias of the node to be set</td>
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>option_alias</strong> – the edge between current node and new node</td>
</tr>
</tbody>
</table>