diff --git a/README.md b/README.md index 8afb6f2360eb26016fda4170a535895557272014..e6a35a92c26d7233eb71a9cd7c68971394553af2 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and model-checking using temporal logic constraints. +WiseMove is safe reinforcement learning framework that combines hierarchical reinforcement learning and safety verification using temporal logic constraints. Requirements ============ @@ -13,9 +13,9 @@ Installation * Run the install dependencies script: `./scripts/install_dependencies.sh` to install pip3 and required python packages. -Note: The script checks if dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, -else will install required packages from the internet. If you do not have an internet connection and the dependencies folder does not exist, -you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder. +Note: The script checks if the dependencies folder exists in the project root folder. If it does, it will install from the local packages in that folder, else will install required packages from the internet. + +If you do not have an internet connection and the dependencies folder does not exist, you will need to run `./scripts/download_dependencies.sh` using a machine with an internet connection first and transfer that folder. Documentation ============= @@ -25,28 +25,65 @@ Documentation Replicate Results ================= -These are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation. - -* Run `./scripts/install_dependencies.sh` to install python dependencies. +Given below are the minimum steps required to replicate the results for simple_intersection environment. For a detailed user guide, it is recommended to view the documentation. +* Open terminal and navigate to the root of the project directory. * Low-level policies: - * You can choose to train and test all the maneuvers. But this may take some time and is not recommended. - * To train all low-level policies from scratch: `python3 low_level_policy_main.py --train`. This may take some time. - * To test all these trained low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`. - * Make sure the training is fully complete before running above test. - * It is easier to verify few of the maneuvers using below commands: - * To train a single low-level, for example wait: `python3 low_level_policy_main.py --option=wait --train`. - * To test one of these trained low-level policies, for example wait: `python3 low_level_policy_main.py --option=wait --test --saved_policy_in_root` - * Available maneuvers are: wait, changelane, stop, keeplane, follow - * These results are visually evaluated. - * Note: This training has a high variance issue due to the continuous action space, especially for stop and keeplane maneuvers. It may help to train for 0.2 million steps than the default 0.1 million by adding argument '--nb_steps=200000' while training. + * Use `python3 low_level_policy_main.py --help` to see all available commands. + * You can choose to test the provided pre-trained options: + * To visually inspect all pre-trained options: `python3 low_level_policy_main.py --test` + * To evaluate all pre-trained options: `python3 low_level_policy_main.py --evaluate` + * To visually inspect a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --test`. + * To evaluate a specific pre-trained policy: `python3 low_level_policy_main.py --option=wait --evaluate`. + * Available options are: wait, changelane, stop, keeplane, follow + * Or, you can train and test all the options. But this may take some time. Newly trained policies are saved to the root folder by default. + * To train all low-level policies from scratch (~40 minutes): `python3 low_level_policy_main.py --train`. + * To visually inspect all these new low-level policies: `python3 low_level_policy_main.py --test --saved_policy_in_root`. + * To evaluate all these new low-level policies: `python3 low_level_policy_main.py --evaluate --saved_policy_in_root`. + * Make sure the training is fully complete before running above test/evaluation. + * It is faster to verify the training of a few options using below commands (**Recommended**): + * To train a single low-level, for example, *changelane* (~6 minutes): `python3 low_level_policy_main.py --option=changelane --train`. This is saved to the root folder. + * To evaluate one of these new low-level policies, for example *changelane*: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root` + * Available options are: wait, changelane, stop, keeplane, follow + * **To replicate the experiments without additional properties:** + * Note that we have not provided a pre-trained policy that is trained without additional LTL. + * You will need to train it by adding the argument `--without_additional_ltl_properties` to the above *training* procedures. For example, `python3 low_level_policy_main.py --option=changelane --train --without_additional_ltl_properties` + * Now, use `--evaluate` to evaluate this new policy: `python3 low_level_policy_main.py --option=changelane --evaluate --saved_policy_in_root` + * **The results of `--evaluate` here is one trial.** In the experiments reported in the paper, we conduct multiple such trials. + * High-level policy: - * To train high-level policy from scratch using the given low-level policies: `python3 high_level_policy_main.py --train` - * To evaluate this trained high-level policy: `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`. - * The success average and standard deviation corresponds to the result from high-level policy experiments. -* To run MCTS using the high-level policy: - * To obtain a probabilites tree and save it: `python3 mcts.py --train` - * To evaluate using this saved tree: `python3 mcts.py --evaluate --saved_policy_in_root`. - * The success average and standard deviation corresponds to the results from MCTS experiments. + * Use `python3 high_level_policy_main.py --help` to see all available commands. + * You can use the provided pre-trained high-level policy: + * To visually inspect this policy: `python3 high_level_policy_main.py --test` + * To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate` + * Or, you can train the high-level policy from scratch (Note that this takes some time): + * To train using pre-trained low-level policies for 0.2 million steps (~50 minutes): `python3 high_level_policy_main.py --train` + * To visually inspect this new policy: `python3 high_level_policy_main.py --test --saved_policy_in_root` + * To **replicate the experiment** used for reported results (~5 minutes): `python3 high_level_policy_main.py --evaluate --saved_policy_in_root`. + * Since above training takes a long time, you can instead verify using a lower number of steps: + * To train for 0.1 million steps (~25 minutes): `python3 high_level_policy_main.py --train --nb_steps=100000` + * Note that this has a much lower success rate of ~75%. So using this for MCTS will not give reported results. + * The success average and standard deviation in evaluation corresponds to the result from high-level policy experiments. +* MCTS: + * Use `python3 mcts.py --help` to see all available commands. + * You can run MCTS on the provided pre-trained high-level policy: + * To visually inspect MCTS on the pre-trained policy: `python3 mcts.py --test --nb_episodes=10` + * To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate`. Note that this takes a very long time (~16 hours). + * For a shorter version of the experiment: `python3 mcts.py --evaluate --nb_trials=2 --nb_episodes=10` (~20 minutes) + * Or, if you have trained a high-level policy from scratch, you can run MCTS on it: + * To visually inspect MCTS on the new policy: `python3 mcts.py --test --highlevel_policy_in_root --nb_episodes=10` + * To **replicate the experiment** used for reported results: `python3 mcts.py --evaluate --highlevel_policy_in_root`. Note that this takes a very long time (~16 hours). + * For a shorter version of the experiment: `python3 mcts.py --evaluate --highlevel_policy_in_root --nb_trials=2 --nb_episodes=10` (~20 minutes) + * You can use the arguments `--depth` and `--nb_traversals` to vary the depth of the MCTS tree (default is 5) and number of traversals done (default is 50). + * The success average and standard deviation in the evaluation corresponds to the results from MCTS experiments. + + +The time taken to execute above scripts may vary depending on your configuration. The reported results were obtained using a system of the following specs: + +Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz +16GB memory +Nvidia GeForce GTX 1080 Ti +Ubuntu 16.04 + Coding Standards ================ diff --git a/backends/kerasrl_learner.py b/backends/kerasrl_learner.py index a341711f4f6dc571d8386ded1907d8d464f733d3..bd1bb82bf9f8ca8acd2311b4581bfcccafad6b45 100644 --- a/backends/kerasrl_learner.py +++ b/backends/kerasrl_learner.py @@ -206,11 +206,13 @@ class DDPGLearner(LearnerBase): def test_model(self, env, nb_episodes=50, + callbacks=None, visualize=True, nb_max_episode_steps=200): self.agent_model.test( env, nb_episodes=nb_episodes, + callbacks=callbacks, visualize=visualize, nb_max_episode_steps=nb_max_episode_steps) diff --git a/documentation/sphinx/.build/.doc/backends.html b/documentation/sphinx/.build/.doc/backends.html index ffc37fa577b3379e6e713902af2fc773ec15cd2b..65ada1e1e22e1fb2d0f01d8eea101dd08aa9af1b 100644 --- a/documentation/sphinx/.build/.doc/backends.html +++ b/documentation/sphinx/.build/.doc/backends.html @@ -46,15 +46,8 @@
create_agent(policy, tensorboard)
-

Creates a PPO agent

- --- - - - -
Returns:stable_baselines PPO2 object
+

Creates a PPO agent.

+

Returns: stable_baselines PPO2 object

@@ -122,15 +115,15 @@
can_transition()
-

Returns boolean signifying whether we can transition. To be -implemented in subclass.

+

Returns boolean signifying whether we can transition.

+

To be implemented in subclass.

do_transition(observation)
-

Do a transition, assuming we can transition. To be -implemented in subclass.

+

Do a transition, assuming we can transition. To be implemented in +subclass.

@@ -154,7 +147,7 @@ implemented in subclass.

set_current_node(node_alias)
-

Sets the current node which is being executed

+

Sets the current node which is being executed.

@@ -281,7 +274,7 @@ current observation.

-test_model(env, nb_episodes=50, visualize=True, nb_max_episode_steps=200)
+test_model(env, nb_episodes=50, callbacks=None, visualize=True, nb_max_episode_steps=200)

Test the agent on the environment.

@@ -347,11 +340,11 @@ If the policy is implemented by a neural network, this corresponds to a forward
-class backends.kerasrl_learner.DQNLearner(input_shape=(48, ), nb_actions=5, low_level_policies=None, model=None, policy=None, memory=None, **kwargs)
+class backends.kerasrl_learner.DQNLearner(input_shape=(48, ), nb_actions=5, low_level_policies=None, model=None, policy=None, memory=None, test_policy=None, **kwargs)

Bases: backends.learner_base.LearnerBase

-create_agent(model, policy, memory)
+create_agent(model, policy, memory, test_policy)

Creates a KerasRL DDPGAgent with given components.

@@ -389,6 +382,11 @@ If the policy is implemented by a neural network, this corresponds to a forward get_default_policy()
+
+
+get_default_test_policy()
+
+
get_q_value(observation, action)
@@ -474,7 +472,7 @@ current observation.

-train(env, nb_steps=1000000, visualize=False, nb_max_episode_steps=200, tensorboard=False, model_checkpoints=False, checkpoint_interval=10000)
+train(env, nb_steps=1000000, visualize=False, verbose=1, log_interval=10000, nb_max_episode_steps=200, tensorboard=False, model_checkpoints=False, checkpoint_interval=10000)

Train the learning agent on the environment.

@@ -494,6 +492,48 @@ current observation.

+
+
+class backends.kerasrl_learner.RestrictedEpsGreedyQPolicy(eps=0.1)
+

Bases: rl.policy.EpsGreedyQPolicy

+

Implement the epsilon greedy policy

+

Restricted Eps Greedy policy. +This policy ensures that it never chooses the action whose value is -inf

+
+
+select_action(q_values)
+

Return the selected action

+
+
# Arguments
+
q_values (np.ndarray): List of the estimations of Q for each action
+
# Returns
+
Selection action
+
+
+ +
+ +
+
+class backends.kerasrl_learner.RestrictedGreedyQPolicy
+

Bases: rl.policy.GreedyQPolicy

+

Implement the epsilon greedy policy

+

Restricted Greedy policy. +This policy ensures that it never chooses the action whose value is -inf

+
+
+select_action(q_values)
+

Return the selected action

+
+
# Arguments
+
q_values (np.ndarray): List of the estimations of Q for each action
+
# Returns
+
Selection action
+
+
+ +
+

backends.learner_base module

@@ -625,171 +665,283 @@ current observation.

-
-

backends.mcts_learner module

+
+

backends.mcts_controller module

-
-class backends.mcts_learner.MCTSLearner(env, low_level_policies, start_node_alias, max_depth=10)
+
+class backends.mcts_controller.MCTSController(env, low_level_policies, start_node_alias)

Bases: backends.controller_base.ControllerBase

-

Monte Carlo Tree Search implementation using the UCB1 and -progressive widening approach as explained in Paxton et al (2017).

-
-
-M = None
-

visitation count of discrete observation with option

+

MCTS Controller.

+
+
+can_transition()
+

Returns boolean signifying whether we can transition.

+

To be implemented in subclass.

-
-
-N = None
-

visitation count of discrete observations

+
+
+change_low_level_references(env_copy)
+

Change references in low level policies by updating the environment +with the copy of the environment.

+
+++ + + + +
Parameters:env_copy – reference to copy of the environment
-
-
-TR = None
-

total reward from given discrete observation with option

+
+
+check_env(x)
+

Prints the object id of the environment. Debugging function.

-
-
-adj = None
-

adjacency list

+
+
+do_transition()
+

Do a transition, assuming we can transition. To be implemented in +subclass.

+ +++ + + + +
Parameters:observation – final observation from episodic step
-
-
-curr_node_alias = None
-

store current node alias

+
+
+set_current_node(node_alias)
+

Sets the current node which is being executed.

+ +++ + + + +
Parameters:node – node alias of the node to be set
-
-
-curr_node_num = None
-

store current node’s id

+ +
+

backends.mcts_learner module

+
+
+class backends.mcts_learner.MCTSLearner(env, low_level_policies, max_depth=10, debug=False, rollout_timeout=500)
+

Bases: backends.controller_base.ControllerBase

+

MCTS Logic.

-
-do_transition(observation, visualize=False)
-

Do a transition using UCB metric, with the latest observation -from the episodic step.

+
+backup(rollout_reward)
+

Reward backup strategy.

+ +++ + + + +
Parameters:rollout_reward – reward to back up
+
+ +
+
+best_action(node, C)
+

Find the best option to execute from a given node. The constant +C determines the coefficient of the uncertainty estimate.

Parameters:
    -
  • observation – final observation from episodic step
  • -
  • visualize – whether or not to visualize low level steps
  • +
  • node – node from which the best action needs to be found
  • +
  • C – coefficient of uncertainty term
-

Returns o_star using UCB metric

+

Returns the best possible option alias from given node.

-
-get_best_node(observation, use_ucb=False)
-
+
+def_policy()
+

Default policy, used for rollouts.

+
-
-load_model(file_name='mcts.pickle')
-
+
+expand(node, not_visited)
+

Create a new node from the given node. Chooses an option from the +not_visited list. Also moves to the newly created node.

+ +++ + + + +
Parameters:
    +
  • node – node to expand from
  • +
  • not_visited – new possible edges or options
  • +
+
+
+ +
+
+move(option_alias)
+

Move in the MCTS tree. This means moving in the tree, updating +state information and stepping through option_alias.

+ +++ + + + +
Parameters:option_alias – edge or option to execute to reach a next node
+
-
-
-nodes = None
-

node properties

+
+
+option_step()
+

Step through the current option_alias.

-
-save_model(file_name='mcts.pickle')
-
+
+reset()
+

Resets maneuvers and sets current node to root.

+
-
-set_current_node(new_node_alias)
-

Sets the current node which is being executed

+
+search(obs)
+

Perform a traversal from the root node.

- +
Parameters:node – node alias of the node to be set
Parameters:obs – current observation
-
-traverse(observation, visualize=False)
-

Do a complete traversal from root to leaf. Assumes the -environment is reset and we are at the root node.

+
+set_current_node(node_alias)
+

Set current node so that option_step can execute it later.

- +
Parameters:
    -
  • observation – observation from the environment
  • -
  • visualize – whether or not to visualize low level steps
  • -
-
Parameters:node_alias – option alias to execute next
-

Returns value of root node

+
+ +
+
+tree_policy()
+

Policy that determines how to move through the MCTS tree. +Terminates either when environment reaches a terminal state, or we +reach a leaf node.

- -
-

backends.online_mcts_controller module

-
-class backends.online_mcts_controller.OnlineMCTSController(env, low_level_policies, start_node_alias)
-

Bases: backends.controller_base.ControllerBase

-

Online MCTS

+
+class backends.mcts_learner.Node(node_num)
+

Bases: object

+

Represents a node in a tree.

-
-can_transition()
-

Returns boolean signifying whether we can transition. To be -implemented in subclass.

+
+is_terminal()
+

Check whether this node is a leaf node or not.

+
+
+
+
+class backends.mcts_learner.Tree(max_depth)
+

Bases: object

+

Tree representation used for MCTS.

-
-change_low_level_references(env_copy)
-
+
+add_state(obs, dis_obs)
+

Associates observation and discrete observation to the current +node. Useful to keep track of the last known observation(s).

+ +++ + + + +
Parameters:
    +
  • obs – observation to save to current node’s cstate
  • +
  • dis_obs – observation to save to current node’s state
  • +
+
+
-
-do_transition()
-

Do a transition, assuming we can transition. To be -implemented in subclass.

+
+move(option_alias)
+

Use the edge option_alias and move from current node to +a next node.

- +
Parameters:observation – final observation from episodic step
Parameters:option_alias – edge to move along
-
-set_current_node(node_alias)
-

Sets the current node which is being executed

+
+new_node(option_alias)
+

Creates a new node that is a child of the current node. +The new node can be reached by the option_alias edge from +the current node.

- + + + +
Parameters:node – node alias of the node to be set
Parameters:option_alias – the edge between current node and new node
+
+ +
+
+reconstruct(option_alias)
+

Use the option_alias from the root node and reposition the tree +such that the new root node is the node reached by following option_alias +from the current root node.

+ +++ +
Parameters:option_alias – edge to follow from the root node
@@ -804,8 +956,8 @@ implemented in subclass.

class backends.policy_base.PolicyBase

Bases: object

-

Abstract policy base from which every policy backend is defined -and inherited.

+

Abstract policy base from which every policy backend is defined and +inherited.

@@ -819,15 +971,15 @@ and inherited.

can_transition()
-

Returns boolean signifying whether we can transition. To be -implemented in subclass.

+

Returns boolean signifying whether we can transition.

+

To be implemented in subclass.

do_transition()
-

Do a transition, assuming we can transition. To be -implemented in subclass.

+

Do a transition, assuming we can transition. To be implemented in +subclass.

@@ -841,7 +993,7 @@ implemented in subclass.

set_current_node(node_alias)
-

Sets the current node which is being executed

+

Sets the current node which is being executed.

diff --git a/documentation/sphinx/.build/.doc/env.html b/documentation/sphinx/.build/.doc/env.html index c6b76ef48d6c77bdc491defd42e96fb17e168865..10d826e8ae9c3700d9e032bc75671cbf102cef0a 100644 --- a/documentation/sphinx/.build/.doc/env.html +++ b/documentation/sphinx/.build/.doc/env.html @@ -67,7 +67,8 @@
current_model_checking_result()
-

Returns whether or not any of the conditions is currently violated.

+

Returns whether or not any of the conditions is currently +violated.

@@ -97,20 +98,20 @@ it (i.e., goal_achieved is always False).

step(u)
-

Gym compliant step function which -will be implemented in the subclass.

+

Gym compliant step function which will be implemented in the +subclass.

-terminal_reward_type = 'max'
+terminal_reward_type = 'min'
termination_condition
-

In the subclass, specify the condition for termination of the episode -(or the maneuver).

+

In the subclass, specify the condition for termination of the +episode (or the maneuver).

@@ -127,22 +128,22 @@ will be implemented in the subclass.

render()
-

Gym compliant step function which -will be implemented in the subclass.

+

Gym compliant step function which will be implemented in the +subclass.

reset()
-

Gym compliant reset function which -will be implemented in the subclass.

+

Gym compliant reset function which will be implemented in the +subclass.

step(action)
-

Gym compliant step function which -will be implemented in the subclass.

+

Gym compliant step function which will be implemented in the +subclass.

@@ -154,7 +155,7 @@ will be implemented in the subclass.

class env.road_env.RoadEnv

Bases: object

-

The generic road env

+

The generic road env.

TODO: Implement this generic road env for plugging-in other road envs. TODO: roadEnv also having a step() function can cause a problem.

diff --git a/documentation/sphinx/.build/.doc/env.simple_intersection.html b/documentation/sphinx/.build/.doc/env.simple_intersection.html index 61459cf480d757d891957fbab6ae64b4b9808638..8653370d01091c1c33b855cbbde6130e1b81334a 100644 --- a/documentation/sphinx/.build/.doc/env.simple_intersection.html +++ b/documentation/sphinx/.build/.doc/env.simple_intersection.html @@ -175,7 +175,7 @@
get_features_tuple()
-

continuous + discrete features

+

continuous + discrete features.

@@ -391,10 +391,8 @@ boxes Checks all vehicles other than ego.

cost(u)
-
-
Calculate the driving cost of the ego, i.e.,
-
the negative reward for the ego-driving.
-
+

Calculate the driving cost of the ego, i.e., the negative reward for +the ego-driving.

@@ -418,11 +416,6 @@ boxes Checks all vehicles other than ego.

cost_normalization_ranges = [[-3.5, 3.5], [-6.2, 6.2], [-10.081044753473748, -10.081044753473748], [-12, 12], [0, 12], [-2, 2], [-16, 16], [-2, 2]]
-
-
-cost_weights = (1.0, 0.25, 0.1, 1.0, 100.0, 0.1, 0.25, 0.1)
-
-
draw(info=None)
@@ -452,7 +445,7 @@ information as passed on by info.

generate_scenario(**kwargs)
-

Randomly generate a road scenario with

+

Randomly generate a road scenario with.

“the N-number of vehicles + an ego vehicle”

(n_others_range[0] <= N <= n_others_range[1]), @@ -563,17 +556,18 @@ when there is no vehicle ahead.

get_features_tuple()
-

Get/calculate the features wrt. the current state variables

+

Get/calculate the features wrt. the current state variables.

Returns features tuple

goal_achieved
-

A property from the base class which is True if the goal -of the road scenario is achieved, otherwise False. This property is -used in both step of EpisodicEnvBase and the implementation of -the high-level reinforcement learning and execution.

+

A property from the base class which is True if the goal of the road +scenario is achieved, otherwise False.

+

This property is used in both step of EpisodicEnvBase and the +implementation of the high-level reinforcement learning and +execution.

@@ -629,6 +623,11 @@ required to be initialzied.

the number of the other vehicles, initialized in generate_scenario

+
+
+n_others_with_higher_priority
+
+
normalize_cost = False
@@ -637,8 +636,9 @@ required to be initialzied.
normalize_tuple(vec, scale_factor=10)
-

Normalizes each element in a tuple according to ranges defined in self.cost_normalization_ranges. -Normalizes between 0 and 1. And the scales by scale_factor

+

Normalizes each element in a tuple according to ranges defined in +self.cost_normalization_ranges. Normalizes between 0 and 1. And the +scales by scale_factor.

@@ -722,8 +722,8 @@ info: information variables
termination_condition
-

In the subclass, specify the condition for termination of the episode -(or the maneuver).

+

In the subclass, specify the condition for termination of the +episode (or the maneuver).

@@ -909,10 +909,8 @@ returns 0 if it is on the line

env.simple_intersection.utilities.calculate_s(v)
-
-
Calculate the distance traveling when the vehicle is maximally
-
de-accelerating for a complete stop.
-
+

Calculate the distance traveling when the vehicle is maximally de- +accelerating for a complete stop.

@@ -929,7 +927,8 @@ de-acceleration to stop completely.
env.simple_intersection.utilities.calculate_v_max(dist)
-

Calculate the maximum velocity you can reach at the given position ahead.

+

Calculate the maximum velocity you can reach at the given position +ahead.

diff --git a/documentation/sphinx/.build/.doc/high_level_policy_main.html b/documentation/sphinx/.build/.doc/high_level_policy_main.html index efbdcf75591a6c1dd33bef89b17142f4f9dd504c..abaf5f678da66148a66395c8bf0deac408dda7c8 100644 --- a/documentation/sphinx/.build/.doc/high_level_policy_main.html +++ b/documentation/sphinx/.build/.doc/high_level_policy_main.html @@ -51,13 +51,23 @@
-high_level_policy_main.high_level_policy_training(nb_steps=25000, load_weights=False, training=True, testing=True, nb_episodes_for_test=10, max_nb_steps=100, visualize=False, tensorboard=False, save_path='highlevel_weights.h5f')
-

Do RL of the high-level policy and test it. -:param nb_steps: the number of steps to perform RL -:param load_weights: True if the pre-learned NN weights are loaded (for initializations of NNs) -:param training: True to enable training -:param testing: True to enable testing -:param nb_episodes_for_test: the number of episodes for testing

+high_level_policy_main.high_level_policy_training(nb_steps=25000, load_weights=False, training=True, testing=True, nb_episodes_for_test=20, max_nb_steps=100, visualize=False, tensorboard=False, save_path='highlevel_weights.h5f') +

Do RL of the high-level policy and test it.

+
+++ + + + +
Parameters:
    +
  • nb_steps – the number of steps to perform RL
  • +
  • load_weights – True if the pre-learned NN weights are loaded (for initializations of NNs)
  • +
  • training – True to enable training
  • +
  • testing – True to enable testing
  • +
  • nb_episodes_for_test – the number of episodes for testing
  • +
+
diff --git a/documentation/sphinx/.build/.doc/low_level_policy_main.html b/documentation/sphinx/.build/.doc/low_level_policy_main.html index 5867273e07626f31021a60cb2ed14756236c9fed..ba46003d5f8a84787da1724dc1ded92a05e780bb 100644 --- a/documentation/sphinx/.build/.doc/low_level_policy_main.html +++ b/documentation/sphinx/.build/.doc/low_level_policy_main.html @@ -34,23 +34,55 @@

low_level_policy_main module

+
+
+class low_level_policy_main.ManeuverEvaluateCallback(maneuver)
+

Bases: rl.callbacks.Callback

+
+
+on_episode_end(episode, logs={})
+

Called at end of each episode

+
+ +
+
+on_train_end(logs=None)
+
+ +
+ +
+
+low_level_policy_main.evaluate_low_level_policy(maneuver, pretrained=False, nb_episodes_for_eval=100)
+
+
-low_level_policy_main.low_level_policy_testing(maneuver, pretrained=False, nb_episodes_for_test=20)
+low_level_policy_main.low_level_policy_testing(maneuver, pretrained=False, visualize=True, nb_episodes_for_test=20)
-low_level_policy_main.low_level_policy_training(maneuver, nb_steps, RL_method='DDPG', load_weights=False, training=True, testing=True, visualize=False, nb_episodes_for_test=10, tensorboard=False)
-

Do RL of the low-level policy of the given maneuver and test it. -:param maneuver: the name of the maneuver defined in config.json (e.g., ‘default’). -:param nb_steps: the number of steps to perform RL. -:param RL_method: either DDPG or PPO2. -:param load_weights: True if the pre-learned NN weights are loaded (for initializations of NNs). -:param training: True to enable training. -:param testing: True to enable testing. -:param visualize: True to see the graphical outputs during training. -:param nb_episodes_for_test: the number of episodes for testing.

+low_level_policy_main.low_level_policy_training(maneuver, nb_steps, RL_method='DDPG', load_weights=False, training=True, testing=True, visualize=False, nb_episodes_for_test=10, tensorboard=False, without_ltl=False) +

Do RL of the low-level policy of the given maneuver and test it.

+ +++ + + + +
Parameters:
    +
  • maneuver – the name of the maneuver defined in config.json (e.g., ‘default’).
  • +
  • nb_steps – the number of steps to perform RL.
  • +
  • RL_method – either DDPG or PPO2.
  • +
  • load_weights – True if the pre-learned NN weights are loaded (for initializations of NNs).
  • +
  • training – True to enable training.
  • +
  • testing – True to enable testing.
  • +
  • visualize – True to see the graphical outputs during training.
  • +
  • nb_episodes_for_test – the number of episodes for testing.
  • +
+
diff --git a/documentation/sphinx/.build/.doc/mcts.html b/documentation/sphinx/.build/.doc/mcts.html index 4f31cb1bc948bef8ba5fb734f6978771ab48cbc7..97fdadb5eb0d51465375405b0d781f455d167656 100644 --- a/documentation/sphinx/.build/.doc/mcts.html +++ b/documentation/sphinx/.build/.doc/mcts.html @@ -50,39 +50,28 @@
-
-
-mcts.evaluate_online_mcts(nb_episodes=20, nb_trials=5)
-
-
-mcts.mcts_evaluation(nb_traversals, num_trials=5, visualize=False)
-

Do RL of the low-level policy of the given maneuver and test it. -:param nb_traversals: number of MCTS traversals -:param save_every: save at every these many traversals -:param visualize: visualization / rendering

-
- -
-
-mcts.mcts_training(nb_traversals, save_every=20, visualize=False)
-

Do RL of the low-level policy of the given maneuver and test it. -:param nb_traversals: number of MCTS traversals -:param save_every: save at every these many traversals -:param visualize: visualization / rendering

+mcts.mcts_evaluation(depth, nb_traversals, nb_episodes, nb_trials, visualize=False, debug=False, pretrained=True, highlevel_policy_file='highlevel_weights.h5f') +

Do RL of the low-level policy of the given maneuver and test it.

+ +++ + + + +
Parameters:
    +
  • depth – depth of each tree search
  • +
  • nb_traversals – number of MCTS traversals per episodes
  • +
  • nb_episodes – number of episodes per trial
  • +
  • nb_trials – number of trials
  • +
  • visualize – visualization / rendering
  • +
  • debug – whether or not to show debug information
  • +
+
-
-
-mcts.mcts_visualize(file_name)
-
- -
-
-mcts.online_mcts(nb_episodes=10)
-
- diff --git a/documentation/sphinx/.build/.doc/model_checker.html b/documentation/sphinx/.build/.doc/model_checker.html deleted file mode 100644 index eefdd7550dd9ae37cf194e124138f4e726a0bd83..0000000000000000000000000000000000000000 --- a/documentation/sphinx/.build/.doc/model_checker.html +++ /dev/null @@ -1,606 +0,0 @@ - - - - - - - - model_checker package — WiseMove documentation - - - - - - - - - - - - - -
- - -
-

model_checker package

- -
-

Submodules

-
-
-

model_checker.LTL_property_base module

-
-
-class model_checker.LTL_property_base.LTLPropertyBase(LTL_str, penalty, enabled=True)
-

Bases: object

-

This is a base class that contains information of an LTL property.

-

It encapsulates the model-checking part (see check / check_incremental), -and contains additional information. The subclass needs to describe -specific APdict to be used.

-
-
-APdict = None
-

The atomic propositions dict you must set in the subclass.

-
- -
-
-check(trace)
-

Checks the LTL property w.r.t. the given trace.

- --- - - - - - -
Parameters:trace – a sequence of states
Returns:result w.r.t. entire trace, in {TRUE, FALSE, UNDECIDED}
-
- -
-
-check_incremental(state)
-
-
Checks an initialised property w.r.t. the next state in a trace.
-
Assumes init_property or check were previously called.
-
- --- - - - - - -
Parameters:state – next state (an integer)
Returns:incremental result, in {TRUE, FALSE, UNDECIDED}
-
- -
-
-parser = None
-

This property’s checker virtual machine

-
- -
-
-reset_property()
-

Resets existing property so that it can be applied to a new sequence of states. -Assumes init_property or check were previously called.

-
- -
- -
-
-

model_checker.atomic_propositions_base module

-
-
-class model_checker.atomic_propositions_base.AtomicPropositionsBase
-

Bases: model_checker.atomic_propositions_base.Bits

-

An AP-control base class for AP-wise manipulation. -the dictionary APdict and its length APdict_len has -to be given in the subclass

-
-
-APdict = None
-
- -
- -
-
-class model_checker.atomic_propositions_base.Bits(value=0)
-

Bases: object

-

A bit-control class that allows us bit-wise manipulation as shown in the -example:

-
bits = Bits()
-bits[0] = False
-bits[2] = bits[0]
-
-
-
- -
-
-

model_checker.parser module

-
-
-class model_checker.parser.ErrorRec(l, c, s)
-

Bases: object

-
- -
-
-class model_checker.parser.Errors
-

Bases: object

-
-
-static Exception(errMsg)
-
- -
-
-static Init(fn, dir, merge, getParsingPos, errorMessages)
-
- -
-
-static SemErr(errMsg, errPos=None)
-
- -
-
-static Summarize(sourceBuffer)
-
- -
-
-static SynErr(errNum, errPos=None)
-
- -
-
-static Warn(errMsg, errPos=None)
-
- -
-
-count = 0
-
- -
-
-static display(s, e)
-
- -
-
-eof = False
-
- -
-
-errDist = 2
-
- -
-
-errMsgFormat = '%(file)s : (%(line)d, %(col)d) %(text)s\n'
-
- -
-
-errors = []
-
- -
-
-fileName = ''
-
- -
-
-listName = ''
-
- -
-
-mergeErrors = False
-
- -
-
-mergedList = None
-
- -
-
-minErrDist = 2
-
- -
-
-static printMsg(fileName, line, column, msg)
-
- -
-
-static storeError(line, col, s)
-
- -
- -
-
-class model_checker.parser.Parser
-

Bases: object

-
-
-Check(trace)
-

Checks an entire trace w.r.t. an existing property Scanner. -Includes ResetProperty, but not SetProperty.

-
- -
-
-CheckIncremental(state)
-

Checks a new state w.r.t. an existing property Scanner. -Constructs a trace from new states using a new or previous trace list. -If a previous trace list is used, states after index self.step are not valid.

-
- -
-
-Check_old(propscanner, trace)
-

Deprecated method to check an entire trace with a new property Scanner. -Includes SetProperty, which includes ResetProperty.

-
- -
-
-Conjunction(index)
-
- -
-
-Disjunction(index)
-
- -
-
-Expect(n)
-
- -
-
-ExpectWeak(n, follow)
-
- -
-
-FALSE = 0
-
- -
-
-Factor(index)
-
- -
-
-Get()
-
- -
-
-Implication(index)
-
- -
-
-LexString()
-
- -
-
-LookAheadString()
-
- -
-
-Parse(scanner)
-
- -
-
-Property(index)
-

Main property entry point.

-
- -
-
-PyCheck()
-

Unused dummy method.

-
- -
-
-ResetProperty()
-

Re-iniitializes an existing property Scanner.

-
- -
-
-SemErr(msg)
-
- -
-
-SetProperty(propscanner)
-

Sets the property Scanner that tokenizes the property.

-
- -
-
-StartOf(s)
-
- -
-
-Successful()
-
- -
-
-SynErr(errNum)
-
- -
-
-T = True
-
- -
-
-TRUE = 2
-
- -
-
-UNDECIDED = 1
-
- -
-
-UNDEFINED = -1
-
- -
-
-Warning(msg)
-
- -
-
-WeakSeparator(n, syFol, repFol)
-
- -
-
-errorMessages = {0: 'EOF expected', 1: 'proposition expected', 2: '"U" expected', 3: '"F" expected', 4: '"G" expected', 5: '"X" expected', 6: '"=>" expected', 7: '"or" expected', 8: '"and" expected', 9: '"not" expected', 10: '"true" expected', 11: '"false" expected', 12: '"(" expected', 13: '")" expected', 14: '??? expected', 15: 'invalid Property', 16: 'invalid Factor'}
-
- -
-
-getParsingPos()
-
- -
-
-maxT = 14
-
- -
-
-minErrDist = 2
-
- -
-
-set = [[True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False, False, True, True, True, True, False, False, False]]
-
- -
-
-x = False
-
- -
- -
-
-

model_checker.scanner module

-
-
-class model_checker.scanner.Buffer(s)
-

Bases: object

-
-
-EOF = 'Ā'
-
- -
-
-Peek()
-
- -
-
-Read()
-
- -
-
-ReadChars(numBytes=1)
-
- -
-
-getPos()
-
- -
-
-getString(beg, end)
-
- -
-
-readPosition(pos)
-
- -
-
-setPos(value)
-
- -
- -
-
-class model_checker.scanner.Position(buf, beg, len, col)
-

Bases: object

-
-
-getSubstring()
-
- -
- -
-
-class model_checker.scanner.Scanner(s)
-

Bases: object

-
-
-CheckLiteral()
-
- -
-
-Comment0()
-
- -
-
-EOL = '\n'
-
- -
-
-NextCh()
-
- -
-
-NextToken()
-
- -
-
-Peek()
-
- -
-
-ResetPeek()
-
- -
-
-Scan()
-
- -
-
-charSetSize = 256
-
- -
-
-eofSym = 0
-
- -
-
-maxT = 14
-
- -
-
-noSym = 14
-
- -
-
-start = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1]
-
- -
- -
-
-class model_checker.scanner.Token
-

Bases: object

-
- -
-
-

Module contents

-
-
- - -
- - - - - \ No newline at end of file diff --git a/documentation/sphinx/.build/.doc/model_checker.simple_intersection.html b/documentation/sphinx/.build/.doc/model_checker.simple_intersection.html deleted file mode 100644 index be797d77d135c3387879762c6f99dd29bc847b4c..0000000000000000000000000000000000000000 --- a/documentation/sphinx/.build/.doc/model_checker.simple_intersection.html +++ /dev/null @@ -1,108 +0,0 @@ - - - - - - - - model_checker.simple_intersection package — WiseMove documentation - - - - - - - - - - - - - -
- - -
-

model_checker.simple_intersection package

-
-

Submodules

-
-
-

model_checker.simple_intersection.AP_dict module

-
-
-

model_checker.simple_intersection.LTL_test module

-
-
-

model_checker.simple_intersection.classes module

-
-
-class model_checker.simple_intersection.classes.AtomicPropositions
-

Bases: model_checker.atomic_propositions_base.AtomicPropositionsBase

-
-
An AP-control class for AP-wise manipulation as shown in the
-

example:

-

APs = AtomicPropositions() -APs[0] = False # this is same as

-
-
# APs[AP_dict[‘stop_region’]] = False
-

APs[‘in_stop_region’] = False # both expressions are the same

-
-
Requires:
-
Index in […] is an integer should be in the range -{0,1,2, …, AP_dict_len}.
-
-
-
-APdict = {'before_but_close_to_stop_region': 2, 'before_intersection': 13, 'has_entered_stop_region': 1, 'has_stopped_in_stop_region': 4, 'highest_priority': 8, 'in_intersection': 5, 'in_stop_region': 0, 'intersection_is_clear': 9, 'lane': 11, 'on_route': 7, 'over_speed_limit': 6, 'parallel_to_lane': 12, 'stopped_now': 3, 'target_lane': 14, 'veh_ahead': 10, 'veh_ahead_stopped_now': 15, 'veh_ahead_too_close': 16}
-
- -
- -
-
-class model_checker.simple_intersection.classes.LTLProperty(LTL_str, penalty, enabled=True)
-

Bases: model_checker.LTL_property_base.LTLPropertyBase

-

is a class that contains information of an LTL property in -simple_intersection road scenario.

-

It encapsulates the model-checking part and contains additional -information.

-
-
-APdict = {'before_but_close_to_stop_region': 2, 'before_intersection': 13, 'has_entered_stop_region': 1, 'has_stopped_in_stop_region': 4, 'highest_priority': 8, 'in_intersection': 5, 'in_stop_region': 0, 'intersection_is_clear': 9, 'lane': 11, 'on_route': 7, 'over_speed_limit': 6, 'parallel_to_lane': 12, 'stopped_now': 3, 'target_lane': 14, 'veh_ahead': 10, 'veh_ahead_stopped_now': 15, 'veh_ahead_too_close': 16}
-
- -
- -
-
-

Module contents

-
-
- - -
- - - - - \ No newline at end of file diff --git a/documentation/sphinx/.build/.doc/modules.html b/documentation/sphinx/.build/.doc/modules.html index 6409144b21bb3e9c930fa9da078f0b10d49626ba..71c3ec5c58fad95053b01f8dee34ccdf9277c72b 100644 --- a/documentation/sphinx/.build/.doc/modules.html +++ b/documentation/sphinx/.build/.doc/modules.html @@ -6,7 +6,7 @@ - wisemove — WiseMove documentation + wise-move-dev — WiseMove documentation @@ -20,7 +20,7 @@