Adaptive Discretization QL Agent

class or_suite.agents.rl.ada_ql.AdaptiveDiscretizationQL(epLen, scaling, inherit_flag, dim)[source]

Adaptive Q-Learning algorithm implemented for enviroments with continuous states and actions using the metric induces by the l_inf norm

epLen: (int) number of steps per episode

scaling: (float) scaling parameter for confidence intervals

inherit_flag: (bool) boolean of whether to inherit estimates

dim: (int) dimension of R^d the state_action space is represented in

__init__(epLen, scaling, inherit_flag, dim)[source]: Initialize self. See help(type(self)) for accurate signature.

pick_action(state, timestep)[source]

Select action according to a greedy policy.

Parameters

state – int - current state
timestep – int - timestep within episode

Returns

action

Return type

int

update_config(env, config)[source]: Update agent information based on the config__file.

update_obs(obs, action, reward, newObs, timestep, info)[source]: Updates estimate of the Q function for the ball used in a given state.

update_policy(k)[source]: Update internal policy based upon records.