Grid Search Agent

class or_suite.agents.oil_discovery.grid_search.grid_searchAgent(epLen, dim=1)[source]

Agent that uses a bisection-method heuristic algorithm to the find location with the highest probability of discovering oil.

reset()[source]: resets bounds of agent to reflect upper and lower bounds of metric space

update_config(): (UNIMPLEMENTED)

update_obs(obs, action, reward, newObs, timestep, info)[source]: record reward of current midpoint or move bounds in direction of higher reward

pick_action(state, step)[source]: move agent to midpoint or perturb current dimension

epLen: (int) number of time steps to run the experiment for

dim: (int) dimension of metric space for agent and environment

upper: (float list list) matrix containing upper bounds of agent at each step in dimension

lower: (float list list) matrix contianing lower bounds of agent at each step in dimension

perturb_estimates: (float list list) matrix containing estimated rewards from perturbation in each dimension

midpoint_value: (float list) list containing midpoint of agent at each step

dim_index: (int list) list looping through various dimensions during perturbation

select_midpoint: (bool list) list recording whether to take midpoint or perturb at given step

__init__(epLen, dim=1)[source]

Parameters

epLen – (int) number of time steps to run the experiment for
dim – (int) dimension of metric space for agent and environment

pick_action(state, step)[source]: If upper and lower bounds are updated based on perturbed values, move agent to midpoint. Else, perturb dimension by factor equal to half the distance from each bound to midpoint.

update_obs(obs, action, reward, newObs, timestep, info)[source]: If no perturbations needed, update reward to be value at midpoint. Else, adjust upper or lower bound in the direction of higher reward as determined by the perturbation step. Agent loops across each dimension separately, and updates estimated midpoint after each loop.

update_policy(k)[source]

Update internal policy based upon records.

Not used, because a greedy algorithm does not have a policy.