Discrete MB Agent
- class or_suite.agents.rl.discrete_mb.DiscreteMB(action_space, state_space, epLen, scaling, alpha, flag)[source]
Uniform model-based algorithm implemented for MultiDiscrete enviroments and actions using the metric induces by the l_inf norm
- epLen
(int) number of steps per episode
- scaling
(float) scaling parameter for confidence intervals
- action_space
(MultiDiscrete) the action space
- state_space
(MultiDiscrete) the state space
- action_size
(list) representing the size of the action sapce
- state_size
(list) representing the size of the state sapce
- alpha
(float) parameter for prior on transition kernel
- flag
(bool) for whether to do full step updates or not
- matrix_dim
(tuple) a concatenation of epLen, state_size, and action_size used to create the estimate arrays of the appropriate size
- qVals
(list) The Q-value estimates for each episode, state, action tuple
- num_visits
(list) The number of times that each episode, state, action tuple has been visited
- vVals
(list) The value function values for every step, state pair
- rEst
(list) Estimates of the reward for a step, state, action tuple
- pEst
(list) Estimates of the number of times that each step, state, action, new_state tuple is considered
- __init__(action_space, state_space, epLen, scaling, alpha, flag)[source]
Initialize self. See help(type(self)) for accurate signature.
- pick_action(state, step)[source]
Select action according to a greedy policy
- Parameters
state – int - current state
step – int - timestep within episode
- Returns
action
- Return type
list
- update_obs(obs, action, reward, newObs, timestep, info)[source]
Add observation to records
- Parameters
obs – (list) The current state
action – (list) The action taken
reward – (int) The calculated reward
newObs – (list) The next observed state
timestep – (int) The current timestep