Ambulance Metric

Implementation of a basic RL environment for continuous spaces. Includes three test problems which were used in generating the figures.

An ambulance environment over [0,1]. An agent interacts through the environment by picking a location to station the ambulance. Then a patient arrives and the ambulance most go and serve the arrival, paying a cost of travel.

class or_suite.envs.ambulance.ambulance_metric.AmbulanceEnvironment(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'epLen': 5, 'norm': 1, 'num_ambulance': 1, 'starting_state': array([0.], dtype=float32)})[source]

A 1-dimensional reinforcement learning environment in the space X = [0, 1].

Ambulances are located anywhere in X = [0,1], and at the beginning of each iteration, the agent chooses where to station each ambulance (the action). A call arrives, and the nearest ambulance goes to the location of that call.

epLen: The (int) number of time steps to run the experiment for.

arrival_dist: A (lambda) arrival distribution for calls over the space [0,1]; takes an integer (step) and returns a float between 0 and 1.

alpha: A float controlling proportional difference in cost to move between calls and to respond to a call.

starting_state: A float list containing the starting locations for each ambulance.

num_ambulance: The (int) number of ambulances in the environment.

state: An int list representing the current state of the environment.

timestep: The (int) timestep the current episode is on.

viewer: The window (Pyglet window or None) where the environment rendering is being drawn.

most_recent_action: (float list or None) The most recent action chosen by the agent (used to render the environment).

action_space: (Gym.spaces Box) Actions must be the length of the number of ambulances, every entry is a float between 0 and 1.

observation_space: (Gym.spaces Box) The environment state must be the length of the number of ambulances, every entry is a float between 0 and 1.

__init__(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'epLen': 5, 'norm': 1, 'num_ambulance': 1, 'starting_state': array([0.], dtype=float32)})[source]

Parameters

config – A (dict) dictionary containing the parameters required to set up a metric ambulance environment.
epLen – The (int) number of time steps to run the experiment for.
arrival_dist – A (lambda) arrival distribution for calls over the space [0,1]; takes an integer (step) and returns a float between 0 and 1.
alpha – A float controlling proportional difference in cost to move between calls and to respond to a call.
starting_state – A float list containing the starting locations for each ambulance.
num_ambulance – The (int) number of ambulances in the environment.
norm – The (int) norm used in the calculations.

close()[source]: Closes the rendering window.

render(mode='human')[source]: Renders the environment using a pyglet window.

reset()[source]: Reinitializes variables and returns the starting state.

reset_current_step(text, line_x1, line_x2, line_y)[source]: Used to render a textbox saying the current timestep.

step(action)[source]

Move one step in the environment.

Parameters

action – A float list of locations in [0,1] the same length as the number of ambulances, where each entry i in the list corresponds to the chosen location for ambulance i.

Returns

reward: A float representing the reward based on the action chosen.

newState: A float list representing the state of the environment after the action and call arrival.

done: A bool flag indicating the end of the episode.

Return type

float, float list, bool