backends package

Submodules

backends.baselines_learner module

class backends.baselines_learner.PPO2Agent(input_shape, nb_actions, env, policy=None, tensorboard=False, log_path='./logs', **kwargs)

Bases: backends.learner_base.LearnerBase

create_agent(policy, tensorboard)

Creates a PPO agent.

Returns: stable_baselines PPO2 object

fit(env=None, nb_steps=1000000, visualize=False, nb_max_episode_steps=200)
forward(observation)
get_default_policy()

Creates the default policy.

Returns: stable_baselines Policy object. default is MlpPolicy

load_weights(file_name='test_weights.h5f')
save_weights(file_name='test_weights.h5f', overwrite=True)
set_environment(env)
test_model(env=None, nb_episodes=50, visualize=True, nb_max_episode_steps=200)

Test the agent on the environment.

Parameters:
  • env – the environment instance. Should contain step(), reset() and optionally, render()
  • nb_episodes – Number of episodes to run
  • visualize – If True, visualizes the test. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode

backends.controller_base module

class backends.controller_base.ControllerBase(env, low_level_policies, start_node_alias)

Bases: backends.policy_base.PolicyBase

Abstract class for controllers.

can_transition()

Returns boolean signifying whether we can transition.

To be implemented in subclass.

do_transition(observation)

Do a transition, assuming we can transition. To be implemented in subclass.

Parameters:observation – final observation from episodic step
low_level_step_current_node()
set_controller_args(**kwargs)
set_current_node(node_alias)

Sets the current node which is being executed.

Parameters:node – node alias of the node to be set
step_current_node(visualize_low_level_steps=False)

backends.kerasrl_learner module

class backends.kerasrl_learner.DDPGLearner(input_shape=(48, ), nb_actions=2, actor=None, critic=None, critic_action_input=None, memory=None, random_process=None, **kwargs)

Bases: backends.learner_base.LearnerBase

create_agent(actor, critic, critic_action_input, memory, random_process)

Creates a KerasRL DDPGAgent with given components.

Parameters:
  • actor – Keras Model of actor which takes observation as input and outputs actions.
  • critic – Keras Model of critic that takes concatenation of observation and action and outputs a single value.
  • critic_action_input – Keras Input which was used in creating action input of the critic model.
  • memory – KerasRL Memory.
  • random_process – KerasRL random process.
Returns:

KerasRL DDPGAgent object

get_default_actor_model()

Creates the default actor model.

Returns: Keras Model object of actor

get_default_critic_model()

Creates the default critic model.

Returns: Keras Model object of critic

get_default_memory()

Creates the default memory model.

Returns: KerasRL SequentialMemory object

get_default_randomprocess()

Creates the default random process model.

Returns: KerasRL OrnsteinUhlenbeckProcess object

load_model(file_name='test_weights.h5f')

Load the weights of an agent.

Parameters:file_name – filename to be used when loading
predict(observation)

Perform a forward pass and return next action by agent based on current observation.

Parameters:observation – the current observation. Shape should be same as self.input_shape

Returns: The action taken by agent depending on given observation

save_model(file_name='test_weights.h5f', overwrite=True)

Save the weights of the agent. To be used after learning.

Parameters:
  • file_name – filename to be used when saving
  • overwrite – If True, overwrites existing file
test_model(env, nb_episodes=50, callbacks=None, visualize=True, nb_max_episode_steps=200)

Test the agent on the environment.

Parameters:
  • env – the environment instance. Should contain step(), reset() and optionally, render()
  • nb_episodes – Number of episodes to run
  • visualize – If True, visualizes the test. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
train(env, nb_steps=1000000, visualize=False, verbose=1, log_interval=10000, nb_max_episode_steps=200, model_checkpoints=False, checkpoint_interval=100000, tensorboard=False)

Train the learning agent on the environment.

Parameters:
  • env – the environment instance. Should contain step() and reset() methods and optionally render()
  • nb_steps – the total number of steps to train
  • visualize – If True, visualizes the training. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
class backends.kerasrl_learner.DQNAgentOverOptions(model, low_level_policies, policy=None, test_policy=None, enable_double_dqn=True, enable_dueling_network=False, dueling_type='avg', *args, **kwargs)

Bases: rl.agents.dqn.DQNAgent

forward(observation)

Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.

# Argument
observation (object): The current observation from the environment.
# Returns
The next action to be executed in the environment.
get_modified_q_values(observation)
class backends.kerasrl_learner.DQNLearner(input_shape=(48, ), nb_actions=5, low_level_policies=None, model=None, policy=None, memory=None, test_policy=None, **kwargs)

Bases: backends.learner_base.LearnerBase

create_agent(model, policy, memory, test_policy)

Creates a KerasRL DDPGAgent with given components.

Parameters:
  • model – Keras Model of model which takes observation as input and outputs discrete actions.
  • memory – KerasRL Memory.
Returns:

KerasRL DQN object

get_default_memory()

Creates the default memory model.

Returns: KerasRL SequentialMemory object

get_default_model()

Creates the default model.

Returns: Keras Model object of actor

get_default_policy()
get_default_test_policy()
get_q_value(observation, action)
get_q_value_using_option_alias(observation, option_alias)
get_softq_value_using_option_alias(observation, option_alias)
load_model(file_name='test_weights.h5f')

Load the weights of an agent.

Parameters:file_name – filename to be used when loading
predict(observation)

Perform a forward pass and return next action by agent based on current observation.

Parameters:observation – the current observation. Shape should be same as self.input_shape

Returns: The action taken by agent depending on given observation

save_model(file_name='test_weights.h5f', overwrite=True)

Save the weights of the agent. To be used after learning.

Parameters:
  • file_name – filename to be used when saving
  • overwrite – If True, overwrites existing file
test_model(env, nb_episodes=5, visualize=True, nb_max_episode_steps=400, success_reward_threshold=100)

Test the agent on the environment.

Parameters:
  • env – the environment instance. Should contain step(), reset() and optionally, render()
  • nb_episodes – Number of episodes to run
  • visualize – If True, visualizes the test. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
train(env, nb_steps=1000000, visualize=False, verbose=1, log_interval=10000, nb_max_episode_steps=200, tensorboard=False, model_checkpoints=False, checkpoint_interval=10000)

Train the learning agent on the environment.

Parameters:
  • env – the environment instance. Should contain step() and reset() methods and optionally render()
  • nb_steps – the total number of steps to train
  • visualize – If True, visualizes the training. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
class backends.kerasrl_learner.RestrictedEpsGreedyQPolicy(eps=0.1)

Bases: rl.policy.EpsGreedyQPolicy

Implement the epsilon greedy policy

Restricted Eps Greedy policy. This policy ensures that it never chooses the action whose value is -inf

select_action(q_values)

Return the selected action

# Arguments
q_values (np.ndarray): List of the estimations of Q for each action
# Returns
Selection action
class backends.kerasrl_learner.RestrictedGreedyQPolicy

Bases: rl.policy.GreedyQPolicy

Implement the epsilon greedy policy

Restricted Greedy policy. This policy ensures that it never chooses the action whose value is -inf

select_action(q_values)

Return the selected action

# Arguments
q_values (np.ndarray): List of the estimations of Q for each action
# Returns
Selection action

backends.learner_base module

class backends.learner_base.LearnerBase(input_shape=(10, ), nb_actions=2, **kwargs)

Bases: backends.policy_base.PolicyBase

The abstract class from which each learning policy backend is defined and inherited.

load_model(file_name)

Load the weights of an agent.

Parameters:file_name – filename to be used when loading
predict(observation)

Perform a forward pass and return next action by agent based on current observation.

Parameters:observation – the current observation. Shape should be same as self.input_shape

Returns: The action taken by agent depending on given observation

save_model(file_name, overwrite=True)

Save the weights of the agent. To be used after learning.

Parameters:
  • file_name – filename to be used when saving
  • overwrite – If True, overwrites existing file
test_model(env, nb_episodes=5, visualize=True, nb_max_episode_steps=200)

Test the agent on the environment.

Parameters:
  • env – the environment instance. Should contain step(), reset() and optionally, render()
  • nb_episodes – Number of episodes to run
  • visualize – If True, visualizes the test. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
train(env, nb_steps=50000, visualize=False, nb_max_episode_steps=200)

Train the learning agent on the environment.

Parameters:
  • env – the environment instance. Should contain step() and reset() methods and optionally render()
  • nb_steps – the total number of steps to train
  • visualize – If True, visualizes the training. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode

backends.manual_policy module

class backends.manual_policy.ManualPolicy(env, low_level_policies, transition_adj, start_node_alias)

Bases: backends.controller_base.ControllerBase

Manual policy execution using nodes and edges.

can_transition()

Check if we can transition.

Returns True if we can, false if we cannot.

do_transition(observation)

Do a singular transition using the specified edges.

Parameters:observation – final observation from episodic step (not used)

backends.mcts_controller module

class backends.mcts_controller.MCTSController(env, low_level_policies, start_node_alias)

Bases: backends.controller_base.ControllerBase

MCTS Controller.

can_transition()

Returns boolean signifying whether we can transition.

To be implemented in subclass.

change_low_level_references(env_copy)

Change references in low level policies by updating the environment with the copy of the environment.

Parameters:env_copy – reference to copy of the environment
check_env(x)

Prints the object id of the environment. Debugging function.

do_transition()

Do a transition, assuming we can transition. To be implemented in subclass.

Parameters:observation – final observation from episodic step
set_current_node(node_alias)

Sets the current node which is being executed.

Parameters:node – node alias of the node to be set

backends.mcts_learner module

class backends.mcts_learner.MCTSLearner(env, low_level_policies, max_depth=10, debug=False, rollout_timeout=500)

Bases: backends.controller_base.ControllerBase

MCTS Logic.

backup(rollout_reward)

Reward backup strategy.

Parameters:rollout_reward – reward to back up
best_action(node, C)

Find the best option to execute from a given node. The constant C determines the coefficient of the uncertainty estimate.

Parameters:
  • node – node from which the best action needs to be found
  • C – coefficient of uncertainty term

Returns the best possible option alias from given node.

def_policy()

Default policy, used for rollouts.

expand(node, not_visited)

Create a new node from the given node. Chooses an option from the not_visited list. Also moves to the newly created node.

Parameters:
  • node – node to expand from
  • not_visited – new possible edges or options
move(option_alias)

Move in the MCTS tree. This means moving in the tree, updating state information and stepping through option_alias.

Parameters:option_alias – edge or option to execute to reach a next node
option_step()

Step through the current option_alias.

reset()

Resets maneuvers and sets current node to root.

search(obs)

Perform a traversal from the root node.

Parameters:obs – current observation
set_current_node(node_alias)

Set current node so that option_step can execute it later.

Parameters:node_alias – option alias to execute next
tree_policy()

Policy that determines how to move through the MCTS tree. Terminates either when environment reaches a terminal state, or we reach a leaf node.

class backends.mcts_learner.Node(node_num)

Bases: object

Represents a node in a tree.

is_terminal()

Check whether this node is a leaf node or not.

class backends.mcts_learner.Tree(max_depth)

Bases: object

Tree representation used for MCTS.

add_state(obs, dis_obs)

Associates observation and discrete observation to the current node. Useful to keep track of the last known observation(s).

Parameters:
  • obs – observation to save to current node’s cstate
  • dis_obs – observation to save to current node’s state
move(option_alias)

Use the edge option_alias and move from current node to a next node.

Parameters:option_alias – edge to move along
new_node(option_alias)

Creates a new node that is a child of the current node. The new node can be reached by the option_alias edge from the current node.

Parameters:option_alias – the edge between current node and new node
reconstruct(option_alias)

Use the option_alias from the root node and reposition the tree such that the new root node is the node reached by following option_alias from the current root node.

Parameters:option_alias – edge to follow from the root node

backends.policy_base module

class backends.policy_base.PolicyBase

Bases: object

Abstract policy base from which every policy backend is defined and inherited.

backends.rl_controller module

class backends.rl_controller.RLController(env, low_level_policies, start_node_alias)

Bases: backends.controller_base.ControllerBase

RL controller using a trained policy.

can_transition()

Returns boolean signifying whether we can transition.

To be implemented in subclass.

do_transition()

Do a transition, assuming we can transition. To be implemented in subclass.

Parameters:observation – final observation from episodic step
set_current_node(node_alias)

Sets the current node which is being executed.

Parameters:node – node alias of the node to be set
set_trained_policy(policy)

Module contents