eNET QL Agent

class or_suite.agents.rl.enet_ql.eNetQL(action_net, state_net, epLen, scaling, state_action_dim)[source]

Uniform Discretization Q-Learning algorithm implemented for enviroments with continuous states and actions using the metric induces by the l_inf norm

epLen: (int) number of steps per episode

scaling: (float) scaling parameter for confidence intervals

action_net: (list) of a discretization of action space

state_net: (list) of a discretization of the state space

state_action_dim: d_1 + d_2 dimensions of state and action space respectively

__init__(action_net, state_net, epLen, scaling, state_action_dim)[source]: Initialize self. See help(type(self)) for accurate signature.

get_num_arms()[source]: Returns the number of arms

pick_action(state, step)[source]

Select action according to a greedy policy

Parameters

state – int - current state
timestep – int - timestep within episode

Returns

action

Return type

int

update_config(env, config)[source]: Update agent information based on the config__file

update_obs(obs, action, reward, newObs, timestep, info)[source]: Add observation to records

update_policy(k)[source]: Update internal policy based upon records