Adaptive Discretization MB Agent

class or_suite.agents.rl.ada_mb.AdaptiveDiscretizationMB(epLen, scaling, alpha, split_threshold, inherit_flag, flag, state_dim, action_dim)[source]

Adaptive model-based Q-Learning algorithm implemented for enviroments with continuous states and actions using the metric induces by the l_inf norm.

epLen: (int) number of steps per episode

scaling: (float) scaling parameter for confidence intervals

inherit_flag: (bool) boolean of whether to inherit estimates

dim: (int) dimension of R^d the state_action space is represented in

__init__(epLen, scaling, alpha, split_threshold, inherit_flag, flag, state_dim, action_dim)[source]

Parameters

epLen – number of steps per episode
numIters – total number of iterations
scaling – scaling parameter for UCB term
alpha – parameter to add a prior to the transition kernels
inherit_flag – boolean on whether to inherit when making children nodes
flag – boolean of full (true) or one-step updates (false)

pick_action(state, timestep)[source]

Select action according to a greedy policy.

Parameters

state – int - current state
timestep – int - timestep within episode

Returns

action

Return type

int

update_config(env, config)[source]: Update agent information based on the config__file.

update_obs(obs, action, reward, newObs, timestep, info)[source]: Add observation to records.

update_policy(k)[source]: Update internal policy based upon records.