Oil Discovery

An oil environment also over [0,1].

Here the agent interacts with the environment by picking a location to travel to, paying a cost of travel, and receiving a reward at the new location.

class or_suite.envs.oil_discovery.oil_problem.OilEnvironment(config={'cost_param': 0, 'dim': 1, 'epLen': 5, 'noise_variance': <function <lambda>>, 'oil_prob': <function <lambda>>, 'starting_state': array([0.], dtype=float32)})[source]

An oil discovery problem on the metric space [0,1]^k for some power k.

Here the state space and the action space are given to have the same dimension.

get_config()[source]

Returns the config dictionary used to initialize the environment.

render(mode)

(UNIMPLEMENTED) Renders the environment in the mode passed in; ‘human’ is the only mode currently supported.

close()

(UNIMPLEMENTED) Closes the window where the rendering is being drawn.

epLen

The (int) number of time steps to run the experiment for.

oil_prob

A function taken as input a state, action and timestep, and outputting a reward for moving agent to that location

Type

lambda function

cost_param

The parameter regulating the cost for moving the agent from one location to another

Type

float

noise_variance

A function taken as input state, action, and timestamp, and outputting the noise added on to moving the agent

Type

lambda function

starting_state

An int list containing the starting locations for the agent.

action_space

(Gym.spaces Box) Actions must be the location to move the agent.

observation_space

(Gym.spaces Box) The location of the agent.

__init__(config={'cost_param': 0, 'dim': 1, 'epLen': 5, 'noise_variance': <function <lambda>>, 'oil_prob': <function <lambda>>, 'starting_state': array([0.], dtype=float32)})[source]

Initialize self. See help(type(self)) for accurate signature.

reset()[source]

Reset the environment to its original settings.

step(action)[source]

Move one step in the environment.

Parameters

action – The chosen action; int.

Returns

reward: double; the reward.

newState: int; the new state.

done: 0/1; the flag for end of the episode.

Return type

double, int, 0/1