Discrete QL Agent

class or_suite.agents.rl.discrete_ql.DiscreteQl(action_space, observation_space, epLen, scaling)[source]

Q-Learning algorithm implemented for enviroments with discrete states and actions using the metric induces by the l_inf norm

TODO: Documentation

epLen

(int) number of steps per episode

scaling

(float) scaling parameter for confidence intervals

action_space

(MultiDiscrete) the action space

state_space

(MultiDiscrete) the state space

action_size

(list) representing the size of the action sapce

state_size

(list) representing the size of the state sapce

matrix_dim

(tuple) a concatenation of epLen, state_size, and action_size used to create the estimate arrays of the appropriate size

qVals

(list) The Q-value estimates for each episode, state, action tuple

num_visits

(list) The number of times that each episode, state, action tuple has been visited

__init__(action_space, observation_space, epLen, scaling)[source]

Initialize self. See help(type(self)) for accurate signature.

pick_action(state, step)[source]

Select action according to a greedy policy

Parameters
  • state – int - current state

  • timestep – int - timestep within episode

Returns

action

Return type

list

update_config(env, config)[source]

Update agent information based on the config__file

update_obs(obs, action, reward, newObs, timestep, info)[source]

Add observation to records

Parameters
  • obs – (list) The current state

  • action – (list) The action taken

  • reward – (int) The calculated reward

  • newObs – (list) The next observed state

  • timestep – (int) The current timestep

update_parameters(param)[source]

Update the scaling parameter. :param param: (float) The new scaling value to use

update_policy(k)[source]

Update internal policy based upon records