backends package¶
Submodules¶
backends.baselines_learner module¶
-
class
backends.baselines_learner.
PPO2Agent
(input_shape, nb_actions, env, policy=None, tensorboard=False, log_path='./logs', **kwargs)¶ Bases:
backends.learner_base.LearnerBase
-
create_agent
(policy, tensorboard)¶ Creates a PPO agent.
Returns: stable_baselines PPO2 object
-
fit
(env=None, nb_steps=1000000, visualize=False, nb_max_episode_steps=200)¶
-
forward
(observation)¶
-
get_default_policy
()¶ Creates the default policy.
Returns: stable_baselines Policy object. default is MlpPolicy
-
load_weights
(file_name='test_weights.h5f')¶
-
save_weights
(file_name='test_weights.h5f', overwrite=True)¶
-
set_environment
(env)¶
-
test_model
(env=None, nb_episodes=50, visualize=True, nb_max_episode_steps=200)¶ Test the agent on the environment.
Parameters: - env – the environment instance. Should contain step(), reset() and optionally, render()
- nb_episodes – Number of episodes to run
- visualize – If True, visualizes the test. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
backends.controller_base module¶
-
class
backends.controller_base.
ControllerBase
(env, low_level_policies, start_node_alias)¶ Bases:
backends.policy_base.PolicyBase
Abstract class for controllers.
-
can_transition
()¶ Returns boolean signifying whether we can transition.
To be implemented in subclass.
-
do_transition
(observation)¶ Do a transition, assuming we can transition. To be implemented in subclass.
Parameters: observation – final observation from episodic step
-
low_level_step_current_node
()¶
-
set_controller_args
(**kwargs)¶
-
set_current_node
(node_alias)¶ Sets the current node which is being executed.
Parameters: node – node alias of the node to be set
-
step_current_node
(visualize_low_level_steps=False)¶
-
backends.kerasrl_learner module¶
-
class
backends.kerasrl_learner.
DDPGLearner
(input_shape=(48, ), nb_actions=2, actor=None, critic=None, critic_action_input=None, memory=None, random_process=None, **kwargs)¶ Bases:
backends.learner_base.LearnerBase
-
create_agent
(actor, critic, critic_action_input, memory, random_process)¶ Creates a KerasRL DDPGAgent with given components.
Parameters: - actor – Keras Model of actor which takes observation as input and outputs actions.
- critic – Keras Model of critic that takes concatenation of observation and action and outputs a single value.
- critic_action_input – Keras Input which was used in creating action input of the critic model.
- memory – KerasRL Memory.
- random_process – KerasRL random process.
Returns: KerasRL DDPGAgent object
-
get_default_actor_model
()¶ Creates the default actor model.
Returns: Keras Model object of actor
-
get_default_critic_model
()¶ Creates the default critic model.
Returns: Keras Model object of critic
-
get_default_memory
()¶ Creates the default memory model.
Returns: KerasRL SequentialMemory object
-
get_default_randomprocess
()¶ Creates the default random process model.
Returns: KerasRL OrnsteinUhlenbeckProcess object
-
load_model
(file_name='test_weights.h5f')¶ Load the weights of an agent.
Parameters: file_name – filename to be used when loading
-
predict
(observation)¶ Perform a forward pass and return next action by agent based on current observation.
Parameters: observation – the current observation. Shape should be same as self.input_shape Returns: The action taken by agent depending on given observation
-
save_model
(file_name='test_weights.h5f', overwrite=True)¶ Save the weights of the agent. To be used after learning.
Parameters: - file_name – filename to be used when saving
- overwrite – If True, overwrites existing file
-
test_model
(env, nb_episodes=50, callbacks=None, visualize=True, nb_max_episode_steps=200)¶ Test the agent on the environment.
Parameters: - env – the environment instance. Should contain step(), reset() and optionally, render()
- nb_episodes – Number of episodes to run
- visualize – If True, visualizes the test. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
train
(env, nb_steps=1000000, visualize=False, verbose=1, log_interval=10000, nb_max_episode_steps=200, model_checkpoints=False, checkpoint_interval=100000, tensorboard=False)¶ Train the learning agent on the environment.
Parameters: - env – the environment instance. Should contain step() and reset() methods and optionally render()
- nb_steps – the total number of steps to train
- visualize – If True, visualizes the training. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
-
class
backends.kerasrl_learner.
DQNAgentOverOptions
(model, low_level_policies, policy=None, test_policy=None, enable_double_dqn=True, enable_dueling_network=False, dueling_type='avg', *args, **kwargs)¶ Bases:
rl.agents.dqn.DQNAgent
-
forward
(observation)¶ Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.
- # Argument
- observation (object): The current observation from the environment.
- # Returns
- The next action to be executed in the environment.
-
get_modified_q_values
(observation)¶
-
-
class
backends.kerasrl_learner.
DQNLearner
(input_shape=(48, ), nb_actions=5, low_level_policies=None, model=None, policy=None, memory=None, test_policy=None, **kwargs)¶ Bases:
backends.learner_base.LearnerBase
-
create_agent
(model, policy, memory, test_policy)¶ Creates a KerasRL DDPGAgent with given components.
Parameters: - model – Keras Model of model which takes observation as input and outputs discrete actions.
- memory – KerasRL Memory.
Returns: KerasRL DQN object
-
get_default_memory
()¶ Creates the default memory model.
Returns: KerasRL SequentialMemory object
-
get_default_model
()¶ Creates the default model.
Returns: Keras Model object of actor
-
get_default_policy
()¶
-
get_default_test_policy
()¶
-
get_q_value
(observation, action)¶
-
get_q_value_using_option_alias
(observation, option_alias)¶
-
get_softq_value_using_option_alias
(observation, option_alias)¶
-
load_model
(file_name='test_weights.h5f')¶ Load the weights of an agent.
Parameters: file_name – filename to be used when loading
-
predict
(observation)¶ Perform a forward pass and return next action by agent based on current observation.
Parameters: observation – the current observation. Shape should be same as self.input_shape Returns: The action taken by agent depending on given observation
-
save_model
(file_name='test_weights.h5f', overwrite=True)¶ Save the weights of the agent. To be used after learning.
Parameters: - file_name – filename to be used when saving
- overwrite – If True, overwrites existing file
-
test_model
(env, nb_episodes=5, visualize=True, nb_max_episode_steps=400, success_reward_threshold=100)¶ Test the agent on the environment.
Parameters: - env – the environment instance. Should contain step(), reset() and optionally, render()
- nb_episodes – Number of episodes to run
- visualize – If True, visualizes the test. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
train
(env, nb_steps=1000000, visualize=False, verbose=1, log_interval=10000, nb_max_episode_steps=200, tensorboard=False, model_checkpoints=False, checkpoint_interval=10000)¶ Train the learning agent on the environment.
Parameters: - env – the environment instance. Should contain step() and reset() methods and optionally render()
- nb_steps – the total number of steps to train
- visualize – If True, visualizes the training. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
-
class
backends.kerasrl_learner.
RestrictedEpsGreedyQPolicy
(eps=0.1)¶ Bases:
rl.policy.EpsGreedyQPolicy
Implement the epsilon greedy policy
Restricted Eps Greedy policy. This policy ensures that it never chooses the action whose value is -inf
-
select_action
(q_values)¶ Return the selected action
- # Arguments
- q_values (np.ndarray): List of the estimations of Q for each action
- # Returns
- Selection action
-
-
class
backends.kerasrl_learner.
RestrictedGreedyQPolicy
¶ Bases:
rl.policy.GreedyQPolicy
Implement the epsilon greedy policy
Restricted Greedy policy. This policy ensures that it never chooses the action whose value is -inf
-
select_action
(q_values)¶ Return the selected action
- # Arguments
- q_values (np.ndarray): List of the estimations of Q for each action
- # Returns
- Selection action
-
backends.learner_base module¶
-
class
backends.learner_base.
LearnerBase
(input_shape=(10, ), nb_actions=2, **kwargs)¶ Bases:
backends.policy_base.PolicyBase
The abstract class from which each learning policy backend is defined and inherited.
-
load_model
(file_name)¶ Load the weights of an agent.
Parameters: file_name – filename to be used when loading
-
predict
(observation)¶ Perform a forward pass and return next action by agent based on current observation.
Parameters: observation – the current observation. Shape should be same as self.input_shape Returns: The action taken by agent depending on given observation
-
save_model
(file_name, overwrite=True)¶ Save the weights of the agent. To be used after learning.
Parameters: - file_name – filename to be used when saving
- overwrite – If True, overwrites existing file
-
test_model
(env, nb_episodes=5, visualize=True, nb_max_episode_steps=200)¶ Test the agent on the environment.
Parameters: - env – the environment instance. Should contain step(), reset() and optionally, render()
- nb_episodes – Number of episodes to run
- visualize – If True, visualizes the test. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
train
(env, nb_steps=50000, visualize=False, nb_max_episode_steps=200)¶ Train the learning agent on the environment.
Parameters: - env – the environment instance. Should contain step() and reset() methods and optionally render()
- nb_steps – the total number of steps to train
- visualize – If True, visualizes the training. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
backends.manual_policy module¶
-
class
backends.manual_policy.
ManualPolicy
(env, low_level_policies, transition_adj, start_node_alias)¶ Bases:
backends.controller_base.ControllerBase
Manual policy execution using nodes and edges.
-
can_transition
()¶ Check if we can transition.
Returns True if we can, false if we cannot.
-
do_transition
(observation)¶ Do a singular transition using the specified edges.
Parameters: observation – final observation from episodic step (not used)
-
backends.mcts_controller module¶
-
class
backends.mcts_controller.
MCTSController
(env, low_level_policies, start_node_alias)¶ Bases:
backends.controller_base.ControllerBase
MCTS Controller.
-
can_transition
()¶ Returns boolean signifying whether we can transition.
To be implemented in subclass.
-
change_low_level_references
(env_copy)¶ Change references in low level policies by updating the environment with the copy of the environment.
Parameters: env_copy – reference to copy of the environment
-
check_env
(x)¶ Prints the object id of the environment. Debugging function.
-
do_transition
()¶ Do a transition, assuming we can transition. To be implemented in subclass.
Parameters: observation – final observation from episodic step
-
set_current_node
(node_alias)¶ Sets the current node which is being executed.
Parameters: node – node alias of the node to be set
-
backends.mcts_learner module¶
-
class
backends.mcts_learner.
MCTSLearner
(env, low_level_policies, max_depth=10, debug=False, rollout_timeout=500)¶ Bases:
backends.controller_base.ControllerBase
MCTS Logic.
-
backup
(rollout_reward)¶ Reward backup strategy.
Parameters: rollout_reward – reward to back up
-
best_action
(node, C)¶ Find the best option to execute from a given node. The constant C determines the coefficient of the uncertainty estimate.
Parameters: - node – node from which the best action needs to be found
- C – coefficient of uncertainty term
Returns the best possible option alias from given node.
-
def_policy
()¶ Default policy, used for rollouts.
-
expand
(node, not_visited)¶ Create a new node from the given node. Chooses an option from the not_visited list. Also moves to the newly created node.
Parameters: - node – node to expand from
- not_visited – new possible edges or options
-
move
(option_alias)¶ Move in the MCTS tree. This means moving in the tree, updating state information and stepping through option_alias.
Parameters: option_alias – edge or option to execute to reach a next node
-
option_step
()¶ Step through the current option_alias.
-
reset
()¶ Resets maneuvers and sets current node to root.
-
search
(obs)¶ Perform a traversal from the root node.
Parameters: obs – current observation
-
set_current_node
(node_alias)¶ Set current node so that option_step can execute it later.
Parameters: node_alias – option alias to execute next
-
tree_policy
()¶ Policy that determines how to move through the MCTS tree. Terminates either when environment reaches a terminal state, or we reach a leaf node.
-
-
class
backends.mcts_learner.
Node
(node_num)¶ Bases:
object
Represents a node in a tree.
-
is_terminal
()¶ Check whether this node is a leaf node or not.
-
-
class
backends.mcts_learner.
Tree
(max_depth)¶ Bases:
object
Tree representation used for MCTS.
-
add_state
(obs, dis_obs)¶ Associates observation and discrete observation to the current node. Useful to keep track of the last known observation(s).
Parameters: - obs – observation to save to current node’s cstate
- dis_obs – observation to save to current node’s state
-
move
(option_alias)¶ Use the edge option_alias and move from current node to a next node.
Parameters: option_alias – edge to move along
-
new_node
(option_alias)¶ Creates a new node that is a child of the current node. The new node can be reached by the option_alias edge from the current node.
Parameters: option_alias – the edge between current node and new node
-
reconstruct
(option_alias)¶ Use the option_alias from the root node and reposition the tree such that the new root node is the node reached by following option_alias from the current root node.
Parameters: option_alias – edge to follow from the root node
-
backends.policy_base module¶
-
class
backends.policy_base.
PolicyBase
¶ Bases:
object
Abstract policy base from which every policy backend is defined and inherited.
backends.rl_controller module¶
-
class
backends.rl_controller.
RLController
(env, low_level_policies, start_node_alias)¶ Bases:
backends.controller_base.ControllerBase
RL controller using a trained policy.
-
can_transition
()¶ Returns boolean signifying whether we can transition.
To be implemented in subclass.
-
do_transition
()¶ Do a transition, assuming we can transition. To be implemented in subclass.
Parameters: observation – final observation from episodic step
-
set_current_node
(node_alias)¶ Sets the current node which is being executed.
Parameters: node – node alias of the node to be set
-
set_trained_policy
(policy)¶
-