eNET MB Agent

class or_suite.agents.rl.enet_mb.eNetMB(action_net, state_net, epLen, scaling, state_action_dim, alpha, flag)[source]

Uniform Discretization model-based algorithm algorithm implemented for enviroments with continuous states and actions using the metric induces by the l_inf norm

epLen

(int) number of steps per episode

scaling

(float) scaling parameter for confidence intervals

action_net

(list) of a discretization of action space

state_net

(list) of a discretization of the state space

state_action_dim

d_1 + d_2 dimensions of state and action space respectively

alpha

(float) parameter for prior on transition kernel

flag

(bool) for whether to do full step updates or not

__init__(action_net, state_net, epLen, scaling, state_action_dim, alpha, flag)[source]

Initialize self. See help(type(self)) for accurate signature.

get_num_arms()[source]

Returns the number of arms

pEst

Resets the agent by overwriting all of the estimates back to zero

pick_action(state, step)[source]

Select action according to a greedy policy

Parameters
  • state – int - current state

  • timestep – int - timestep within episode

Returns

action

Return type

int

update_obs(obs, action, reward, newObs, timestep, info)[source]

Add observation to records

update_policy(k)[source]

Update internal policy based upon records