backends package¶
Submodules¶
backends.baselines_learner module¶
-
class
backends.baselines_learner.
PPO2Agent
(input_shape, nb_actions, env, policy=None, tensorboard=False, log_path='./logs', **kwargs)¶ Bases:
backends.learner_base.LearnerBase
-
create_agent
(policy, tensorboard)¶ Creates a PPO agent
Returns: stable_baselines PPO2 object
-
fit
(env=None, nb_steps=1000000, visualize=False, nb_max_episode_steps=200)¶
-
forward
(observation)¶
-
get_default_policy
()¶ Creates the default policy.
Returns: stable_baselines Policy object. default is MlpPolicy
-
load_weights
(file_name='test_weights.h5f')¶
-
save_weights
(file_name='test_weights.h5f', overwrite=True)¶
-
set_environment
(env)¶
-
test_model
(env=None, nb_episodes=50, visualize=True, nb_max_episode_steps=200)¶ Test the agent on the environment.
Parameters: - env – the environment instance. Should contain step(), reset() and optionally, render()
- nb_episodes – Number of episodes to run
- visualize – If True, visualizes the test. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
backends.controller_base module¶
-
class
backends.controller_base.
ControllerBase
(env, low_level_policies, start_node_alias)¶ Bases:
backends.policy_base.PolicyBase
Abstract class for controllers.
-
can_transition
()¶ Returns boolean signifying whether we can transition. To be implemented in subclass.
-
do_transition
(observation)¶ Do a transition, assuming we can transition. To be implemented in subclass.
Parameters: observation – final observation from episodic step
-
low_level_step_current_node
()¶
-
set_controller_args
(**kwargs)¶
-
set_current_node
(node_alias)¶ Sets the current node which is being executed
Parameters: node – node alias of the node to be set
-
step_current_node
(visualize_low_level_steps=False)¶
-
backends.kerasrl_learner module¶
-
class
backends.kerasrl_learner.
DDPGLearner
(input_shape=(48, ), nb_actions=2, actor=None, critic=None, critic_action_input=None, memory=None, random_process=None, **kwargs)¶ Bases:
backends.learner_base.LearnerBase
-
create_agent
(actor, critic, critic_action_input, memory, random_process)¶ Creates a KerasRL DDPGAgent with given components.
Parameters: - actor – Keras Model of actor which takes observation as input and outputs actions.
- critic – Keras Model of critic that takes concatenation of observation and action and outputs a single value.
- critic_action_input – Keras Input which was used in creating action input of the critic model.
- memory – KerasRL Memory.
- random_process – KerasRL random process.
Returns: KerasRL DDPGAgent object
-
get_default_actor_model
()¶ Creates the default actor model.
Returns: Keras Model object of actor
-
get_default_critic_model
()¶ Creates the default critic model.
Returns: Keras Model object of critic
-
get_default_memory
()¶ Creates the default memory model.
Returns: KerasRL SequentialMemory object
-
get_default_randomprocess
()¶ Creates the default random process model.
Returns: KerasRL OrnsteinUhlenbeckProcess object
-
load_model
(file_name='test_weights.h5f')¶ Load the weights of an agent.
Parameters: file_name – filename to be used when loading
-
predict
(observation)¶ Perform a forward pass and return next action by agent based on current observation.
Parameters: observation – the current observation. Shape should be same as self.input_shape Returns: The action taken by agent depending on given observation
-
save_model
(file_name='test_weights.h5f', overwrite=True)¶ Save the weights of the agent. To be used after learning.
Parameters: - file_name – filename to be used when saving
- overwrite – If True, overwrites existing file
-
test_model
(env, nb_episodes=50, visualize=True, nb_max_episode_steps=200)¶ Test the agent on the environment.
Parameters: - env – the environment instance. Should contain step(), reset() and optionally, render()
- nb_episodes – Number of episodes to run
- visualize – If True, visualizes the test. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
train
(env, nb_steps=1000000, visualize=False, verbose=1, log_interval=10000, nb_max_episode_steps=200, model_checkpoints=False, checkpoint_interval=100000, tensorboard=False)¶ Train the learning agent on the environment.
Parameters: - env – the environment instance. Should contain step() and reset() methods and optionally render()
- nb_steps – the total number of steps to train
- visualize – If True, visualizes the training. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
-
class
backends.kerasrl_learner.
DQNAgentOverOptions
(model, low_level_policies, policy=None, test_policy=None, enable_double_dqn=True, enable_dueling_network=False, dueling_type='avg', *args, **kwargs)¶ Bases:
rl.agents.dqn.DQNAgent
-
forward
(observation)¶ Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.
- # Argument
- observation (object): The current observation from the environment.
- # Returns
- The next action to be executed in the environment.
-
get_modified_q_values
(observation)¶
-
-
class
backends.kerasrl_learner.
DQNLearner
(input_shape=(48, ), nb_actions=5, low_level_policies=None, model=None, policy=None, memory=None, **kwargs)¶ Bases:
backends.learner_base.LearnerBase
-
create_agent
(model, policy, memory)¶ Creates a KerasRL DDPGAgent with given components.
Parameters: - model – Keras Model of model which takes observation as input and outputs discrete actions.
- memory – KerasRL Memory.
Returns: KerasRL DQN object
-
get_default_memory
()¶ Creates the default memory model.
Returns: KerasRL SequentialMemory object
-
get_default_model
()¶ Creates the default model.
Returns: Keras Model object of actor
-
get_default_policy
()¶
-
get_q_value
(observation, action)¶
-
get_q_value_using_option_alias
(observation, option_alias)¶
-
get_softq_value_using_option_alias
(observation, option_alias)¶
-
load_model
(file_name='test_weights.h5f')¶ Load the weights of an agent.
Parameters: file_name – filename to be used when loading
-
predict
(observation)¶ Perform a forward pass and return next action by agent based on current observation.
Parameters: observation – the current observation. Shape should be same as self.input_shape Returns: The action taken by agent depending on given observation
-
save_model
(file_name='test_weights.h5f', overwrite=True)¶ Save the weights of the agent. To be used after learning.
Parameters: - file_name – filename to be used when saving
- overwrite – If True, overwrites existing file
-
test_model
(env, nb_episodes=5, visualize=True, nb_max_episode_steps=400, success_reward_threshold=100)¶ Test the agent on the environment.
Parameters: - env – the environment instance. Should contain step(), reset() and optionally, render()
- nb_episodes – Number of episodes to run
- visualize – If True, visualizes the test. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
train
(env, nb_steps=1000000, visualize=False, nb_max_episode_steps=200, tensorboard=False, model_checkpoints=False, checkpoint_interval=10000)¶ Train the learning agent on the environment.
Parameters: - env – the environment instance. Should contain step() and reset() methods and optionally render()
- nb_steps – the total number of steps to train
- visualize – If True, visualizes the training. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
backends.learner_base module¶
-
class
backends.learner_base.
LearnerBase
(input_shape=(10, ), nb_actions=2, **kwargs)¶ Bases:
backends.policy_base.PolicyBase
The abstract class from which each learning policy backend is defined and inherited.
-
load_model
(file_name)¶ Load the weights of an agent.
Parameters: file_name – filename to be used when loading
-
predict
(observation)¶ Perform a forward pass and return next action by agent based on current observation.
Parameters: observation – the current observation. Shape should be same as self.input_shape Returns: The action taken by agent depending on given observation
-
save_model
(file_name, overwrite=True)¶ Save the weights of the agent. To be used after learning.
Parameters: - file_name – filename to be used when saving
- overwrite – If True, overwrites existing file
-
test_model
(env, nb_episodes=5, visualize=True, nb_max_episode_steps=200)¶ Test the agent on the environment.
Parameters: - env – the environment instance. Should contain step(), reset() and optionally, render()
- nb_episodes – Number of episodes to run
- visualize – If True, visualizes the test. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
train
(env, nb_steps=50000, visualize=False, nb_max_episode_steps=200)¶ Train the learning agent on the environment.
Parameters: - env – the environment instance. Should contain step() and reset() methods and optionally render()
- nb_steps – the total number of steps to train
- visualize – If True, visualizes the training. Works only if render() is present in env
- nb_max_episode_steps – Maximum number of steps per episode
-
backends.manual_policy module¶
-
class
backends.manual_policy.
ManualPolicy
(env, low_level_policies, transition_adj, start_node_alias)¶ Bases:
backends.controller_base.ControllerBase
Manual policy execution using nodes and edges.
-
can_transition
()¶ Check if we can transition.
Returns True if we can, false if we cannot.
-
do_transition
(observation)¶ Do a singular transition using the specified edges.
Parameters: observation – final observation from episodic step (not used)
-
backends.mcts_learner module¶
-
class
backends.mcts_learner.
MCTSLearner
(env, low_level_policies, start_node_alias, max_depth=10)¶ Bases:
backends.controller_base.ControllerBase
Monte Carlo Tree Search implementation using the UCB1 and progressive widening approach as explained in Paxton et al (2017).
-
M
= None¶ visitation count of discrete observation with option
-
N
= None¶ visitation count of discrete observations
-
TR
= None¶ total reward from given discrete observation with option
-
adj
= None¶ adjacency list
-
curr_node_alias
= None¶ store current node alias
-
curr_node_num
= None¶ store current node’s id
-
do_transition
(observation, visualize=False)¶ Do a transition using UCB metric, with the latest observation from the episodic step.
Parameters: - observation – final observation from episodic step
- visualize – whether or not to visualize low level steps
Returns o_star using UCB metric
-
get_best_node
(observation, use_ucb=False)¶
-
load_model
(file_name='mcts.pickle')¶
-
nodes
= None¶ node properties
-
save_model
(file_name='mcts.pickle')¶
-
set_current_node
(new_node_alias)¶ Sets the current node which is being executed
Parameters: node – node alias of the node to be set
-
traverse
(observation, visualize=False)¶ Do a complete traversal from root to leaf. Assumes the environment is reset and we are at the root node.
Parameters: - observation – observation from the environment
- visualize – whether or not to visualize low level steps
Returns value of root node
-
backends.online_mcts_controller module¶
-
class
backends.online_mcts_controller.
OnlineMCTSController
(env, low_level_policies, start_node_alias)¶ Bases:
backends.controller_base.ControllerBase
Online MCTS
-
can_transition
()¶ Returns boolean signifying whether we can transition. To be implemented in subclass.
-
change_low_level_references
(env_copy)¶
-
do_transition
()¶ Do a transition, assuming we can transition. To be implemented in subclass.
Parameters: observation – final observation from episodic step
-
set_current_node
(node_alias)¶ Sets the current node which is being executed
Parameters: node – node alias of the node to be set
-
backends.policy_base module¶
-
class
backends.policy_base.
PolicyBase
¶ Bases:
object
Abstract policy base from which every policy backend is defined and inherited.
backends.rl_controller module¶
-
class
backends.rl_controller.
RLController
(env, low_level_policies, start_node_alias)¶ Bases:
backends.controller_base.ControllerBase
RL controller using a trained policy.
-
can_transition
()¶ Returns boolean signifying whether we can transition. To be implemented in subclass.
-
do_transition
()¶ Do a transition, assuming we can transition. To be implemented in subclass.
Parameters: observation – final observation from episodic step
-
set_current_node
(node_alias)¶ Sets the current node which is being executed
Parameters: node – node alias of the node to be set
-
set_trained_policy
(policy)¶
-