backends package

Submodules

backends.baselines_learner module

class backends.baselines_learner.PPO2Agent(input_shape, nb_actions, env, policy=None, tensorboard=False, log_path='./logs', **kwargs)

Bases: backends.learner_base.LearnerBase

create_agent(policy, tensorboard)

Creates a PPO agent

Returns:stable_baselines PPO2 object
fit(env=None, nb_steps=1000000, visualize=False, nb_max_episode_steps=200)
forward(observation)
get_default_policy()

Creates the default policy.

Returns: stable_baselines Policy object. default is MlpPolicy

load_weights(file_name='test_weights.h5f')
save_weights(file_name='test_weights.h5f', overwrite=True)
set_environment(env)
test_model(env=None, nb_episodes=50, visualize=True, nb_max_episode_steps=200)

Test the agent on the environment.

Parameters:
  • env – the environment instance. Should contain step(), reset() and optionally, render()
  • nb_episodes – Number of episodes to run
  • visualize – If True, visualizes the test. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode

backends.controller_base module

class backends.controller_base.ControllerBase(env, low_level_policies, start_node_alias)

Bases: backends.policy_base.PolicyBase

Abstract class for controllers.

can_transition()

Returns boolean signifying whether we can transition. To be implemented in subclass.

do_transition(observation)

Do a transition, assuming we can transition. To be implemented in subclass.

Parameters:observation – final observation from episodic step
low_level_step_current_node()
set_controller_args(**kwargs)
set_current_node(node_alias)

Sets the current node which is being executed

Parameters:node – node alias of the node to be set
step_current_node(visualize_low_level_steps=False)

backends.kerasrl_learner module

class backends.kerasrl_learner.DDPGLearner(input_shape=(48, ), nb_actions=2, actor=None, critic=None, critic_action_input=None, memory=None, random_process=None, **kwargs)

Bases: backends.learner_base.LearnerBase

create_agent(actor, critic, critic_action_input, memory, random_process)

Creates a KerasRL DDPGAgent with given components.

Parameters:
  • actor – Keras Model of actor which takes observation as input and outputs actions.
  • critic – Keras Model of critic that takes concatenation of observation and action and outputs a single value.
  • critic_action_input – Keras Input which was used in creating action input of the critic model.
  • memory – KerasRL Memory.
  • random_process – KerasRL random process.
Returns:

KerasRL DDPGAgent object

get_default_actor_model()

Creates the default actor model.

Returns: Keras Model object of actor

get_default_critic_model()

Creates the default critic model.

Returns: Keras Model object of critic

get_default_memory()

Creates the default memory model.

Returns: KerasRL SequentialMemory object

get_default_randomprocess()

Creates the default random process model.

Returns: KerasRL OrnsteinUhlenbeckProcess object

load_model(file_name='test_weights.h5f')

Load the weights of an agent.

Parameters:file_name – filename to be used when loading
predict(observation)

Perform a forward pass and return next action by agent based on current observation.

Parameters:observation – the current observation. Shape should be same as self.input_shape

Returns: The action taken by agent depending on given observation

save_model(file_name='test_weights.h5f', overwrite=True)

Save the weights of the agent. To be used after learning.

Parameters:
  • file_name – filename to be used when saving
  • overwrite – If True, overwrites existing file
test_model(env, nb_episodes=50, visualize=True, nb_max_episode_steps=200)

Test the agent on the environment.

Parameters:
  • env – the environment instance. Should contain step(), reset() and optionally, render()
  • nb_episodes – Number of episodes to run
  • visualize – If True, visualizes the test. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
train(env, nb_steps=1000000, visualize=False, verbose=1, log_interval=10000, nb_max_episode_steps=200, model_checkpoints=False, checkpoint_interval=100000, tensorboard=False)

Train the learning agent on the environment.

Parameters:
  • env – the environment instance. Should contain step() and reset() methods and optionally render()
  • nb_steps – the total number of steps to train
  • visualize – If True, visualizes the training. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
class backends.kerasrl_learner.DQNAgentOverOptions(model, low_level_policies, policy=None, test_policy=None, enable_double_dqn=True, enable_dueling_network=False, dueling_type='avg', *args, **kwargs)

Bases: rl.agents.dqn.DQNAgent

forward(observation)

Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.

# Argument
observation (object): The current observation from the environment.
# Returns
The next action to be executed in the environment.
get_modified_q_values(observation)
class backends.kerasrl_learner.DQNLearner(input_shape=(48, ), nb_actions=5, low_level_policies=None, model=None, policy=None, memory=None, **kwargs)

Bases: backends.learner_base.LearnerBase

create_agent(model, policy, memory)

Creates a KerasRL DDPGAgent with given components.

Parameters:
  • model – Keras Model of model which takes observation as input and outputs discrete actions.
  • memory – KerasRL Memory.
Returns:

KerasRL DQN object

get_default_memory()

Creates the default memory model.

Returns: KerasRL SequentialMemory object

get_default_model()

Creates the default model.

Returns: Keras Model object of actor

get_default_policy()
get_q_value(observation, action)
get_q_value_using_option_alias(observation, option_alias)
get_softq_value_using_option_alias(observation, option_alias)
load_model(file_name='test_weights.h5f')

Load the weights of an agent.

Parameters:file_name – filename to be used when loading
predict(observation)

Perform a forward pass and return next action by agent based on current observation.

Parameters:observation – the current observation. Shape should be same as self.input_shape

Returns: The action taken by agent depending on given observation

save_model(file_name='test_weights.h5f', overwrite=True)

Save the weights of the agent. To be used after learning.

Parameters:
  • file_name – filename to be used when saving
  • overwrite – If True, overwrites existing file
test_model(env, nb_episodes=5, visualize=True, nb_max_episode_steps=400, success_reward_threshold=100)

Test the agent on the environment.

Parameters:
  • env – the environment instance. Should contain step(), reset() and optionally, render()
  • nb_episodes – Number of episodes to run
  • visualize – If True, visualizes the test. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
train(env, nb_steps=1000000, visualize=False, nb_max_episode_steps=200, tensorboard=False, model_checkpoints=False, checkpoint_interval=10000)

Train the learning agent on the environment.

Parameters:
  • env – the environment instance. Should contain step() and reset() methods and optionally render()
  • nb_steps – the total number of steps to train
  • visualize – If True, visualizes the training. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode

backends.learner_base module

class backends.learner_base.LearnerBase(input_shape=(10, ), nb_actions=2, **kwargs)

Bases: backends.policy_base.PolicyBase

The abstract class from which each learning policy backend is defined and inherited.

load_model(file_name)

Load the weights of an agent.

Parameters:file_name – filename to be used when loading
predict(observation)

Perform a forward pass and return next action by agent based on current observation.

Parameters:observation – the current observation. Shape should be same as self.input_shape

Returns: The action taken by agent depending on given observation

save_model(file_name, overwrite=True)

Save the weights of the agent. To be used after learning.

Parameters:
  • file_name – filename to be used when saving
  • overwrite – If True, overwrites existing file
test_model(env, nb_episodes=5, visualize=True, nb_max_episode_steps=200)

Test the agent on the environment.

Parameters:
  • env – the environment instance. Should contain step(), reset() and optionally, render()
  • nb_episodes – Number of episodes to run
  • visualize – If True, visualizes the test. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode
train(env, nb_steps=50000, visualize=False, nb_max_episode_steps=200)

Train the learning agent on the environment.

Parameters:
  • env – the environment instance. Should contain step() and reset() methods and optionally render()
  • nb_steps – the total number of steps to train
  • visualize – If True, visualizes the training. Works only if render() is present in env
  • nb_max_episode_steps – Maximum number of steps per episode

backends.manual_policy module

class backends.manual_policy.ManualPolicy(env, low_level_policies, transition_adj, start_node_alias)

Bases: backends.controller_base.ControllerBase

Manual policy execution using nodes and edges.

can_transition()

Check if we can transition.

Returns True if we can, false if we cannot.

do_transition(observation)

Do a singular transition using the specified edges.

Parameters:observation – final observation from episodic step (not used)

backends.mcts_learner module

class backends.mcts_learner.MCTSLearner(env, low_level_policies, start_node_alias, max_depth=10)

Bases: backends.controller_base.ControllerBase

Monte Carlo Tree Search implementation using the UCB1 and progressive widening approach as explained in Paxton et al (2017).

M = None

visitation count of discrete observation with option

N = None

visitation count of discrete observations

TR = None

total reward from given discrete observation with option

adj = None

adjacency list

curr_node_alias = None

store current node alias

curr_node_num = None

store current node’s id

do_transition(observation, visualize=False)

Do a transition using UCB metric, with the latest observation from the episodic step.

Parameters:
  • observation – final observation from episodic step
  • visualize – whether or not to visualize low level steps

Returns o_star using UCB metric

get_best_node(observation, use_ucb=False)
load_model(file_name='mcts.pickle')
nodes = None

node properties

save_model(file_name='mcts.pickle')
set_current_node(new_node_alias)

Sets the current node which is being executed

Parameters:node – node alias of the node to be set
traverse(observation, visualize=False)

Do a complete traversal from root to leaf. Assumes the environment is reset and we are at the root node.

Parameters:
  • observation – observation from the environment
  • visualize – whether or not to visualize low level steps

Returns value of root node

backends.online_mcts_controller module

class backends.online_mcts_controller.OnlineMCTSController(env, low_level_policies, start_node_alias)

Bases: backends.controller_base.ControllerBase

Online MCTS

can_transition()

Returns boolean signifying whether we can transition. To be implemented in subclass.

change_low_level_references(env_copy)
do_transition()

Do a transition, assuming we can transition. To be implemented in subclass.

Parameters:observation – final observation from episodic step
set_current_node(node_alias)

Sets the current node which is being executed

Parameters:node – node alias of the node to be set

backends.policy_base module

class backends.policy_base.PolicyBase

Bases: object

Abstract policy base from which every policy backend is defined and inherited.

backends.rl_controller module

class backends.rl_controller.RLController(env, low_level_policies, start_node_alias)

Bases: backends.controller_base.ControllerBase

RL controller using a trained policy.

can_transition()

Returns boolean signifying whether we can transition. To be implemented in subclass.

do_transition()

Do a transition, assuming we can transition. To be implemented in subclass.

Parameters:observation – final observation from episodic step
set_current_node(node_alias)

Sets the current node which is being executed

Parameters:node – node alias of the node to be set
set_trained_policy(policy)

Module contents