Adaptive Discretization QL Agent

class or_suite.agents.rl.ada_ql.AdaptiveDiscretizationQL(epLen, scaling, inherit_flag, dim)[source]

Adaptive Q-Learning algorithm implemented for enviroments with continuous states and actions using the metric induces by the l_inf norm

epLen

(int) number of steps per episode

scaling

(float) scaling parameter for confidence intervals

inherit_flag

(bool) boolean of whether to inherit estimates

dim

(int) dimension of R^d the state_action space is represented in

__init__(epLen, scaling, inherit_flag, dim)[source]

Initialize self. See help(type(self)) for accurate signature.

pick_action(state, timestep)[source]

Select action according to a greedy policy.

Parameters
  • state – int - current state

  • timestep – int - timestep within episode

Returns

action

Return type

int

update_config(env, config)[source]

Update agent information based on the config__file.

update_obs(obs, action, reward, newObs, timestep, info)[source]

Updates estimate of the Q function for the ball used in a given state.

update_policy(k)[source]

Update internal policy based upon records.