eNET QL Agent

class or_suite.agents.rl.enet_ql.eNetQL(action_net, state_net, epLen, scaling, state_action_dim)[source]

Uniform Discretization Q-Learning algorithm implemented for enviroments with continuous states and actions using the metric induces by the l_inf norm

epLen

(int) number of steps per episode

scaling

(float) scaling parameter for confidence intervals

action_net

(list) of a discretization of action space

state_net

(list) of a discretization of the state space

state_action_dim

d_1 + d_2 dimensions of state and action space respectively

__init__(action_net, state_net, epLen, scaling, state_action_dim)[source]

Initialize self. See help(type(self)) for accurate signature.

get_num_arms()[source]

Returns the number of arms

pick_action(state, step)[source]

Select action according to a greedy policy

Parameters
  • state – int - current state

  • timestep – int - timestep within episode

Returns

action

Return type

int

update_config(env, config)[source]

Update agent information based on the config__file

update_obs(obs, action, reward, newObs, timestep, info)[source]

Add observation to records

update_policy(k)[source]

Update internal policy based upon records