Ambulance Metric

Implementation of a basic RL environment for continuous spaces. Includes three test problems which were used in generating the figures.

An ambulance environment over [0,1]. An agent interacts through the environment by picking a location to station the ambulance. Then a patient arrives and the ambulance most go and serve the arrival, paying a cost of travel.

class or_suite.envs.ambulance.ambulance_metric.AmbulanceEnvironment(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'epLen': 5, 'norm': 1, 'num_ambulance': 1, 'starting_state': array([0.], dtype=float32)})[source]

A 1-dimensional reinforcement learning environment in the space X = [0, 1].

Ambulances are located anywhere in X = [0,1], and at the beginning of each iteration, the agent chooses where to station each ambulance (the action). A call arrives, and the nearest ambulance goes to the location of that call.

epLen

The (int) number of time steps to run the experiment for.

arrival_dist

A (lambda) arrival distribution for calls over the space [0,1]; takes an integer (step) and returns a float between 0 and 1.

alpha

A float controlling proportional difference in cost to move between calls and to respond to a call.

starting_state

A float list containing the starting locations for each ambulance.

num_ambulance

The (int) number of ambulances in the environment.

state

An int list representing the current state of the environment.

timestep

The (int) timestep the current episode is on.

viewer

The window (Pyglet window or None) where the environment rendering is being drawn.

most_recent_action

(float list or None) The most recent action chosen by the agent (used to render the environment).

action_space

(Gym.spaces Box) Actions must be the length of the number of ambulances, every entry is a float between 0 and 1.

observation_space

(Gym.spaces Box) The environment state must be the length of the number of ambulances, every entry is a float between 0 and 1.

__init__(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'epLen': 5, 'norm': 1, 'num_ambulance': 1, 'starting_state': array([0.], dtype=float32)})[source]
Parameters
  • config – A (dict) dictionary containing the parameters required to set up a metric ambulance environment.

  • epLen – The (int) number of time steps to run the experiment for.

  • arrival_dist – A (lambda) arrival distribution for calls over the space [0,1]; takes an integer (step) and returns a float between 0 and 1.

  • alpha – A float controlling proportional difference in cost to move between calls and to respond to a call.

  • starting_state – A float list containing the starting locations for each ambulance.

  • num_ambulance – The (int) number of ambulances in the environment.

  • norm – The (int) norm used in the calculations.

close()[source]

Closes the rendering window.

render(mode='human')[source]

Renders the environment using a pyglet window.

reset()[source]

Reinitializes variables and returns the starting state.

reset_current_step(text, line_x1, line_x2, line_y)[source]

Used to render a textbox saying the current timestep.

step(action)[source]

Move one step in the environment.

Parameters

action – A float list of locations in [0,1] the same length as the number of ambulances, where each entry i in the list corresponds to the chosen location for ambulance i.

Returns

reward: A float representing the reward based on the action chosen.

newState: A float list representing the state of the environment after the action and call arrival.

done: A bool flag indicating the end of the episode.

Return type

float, float list, bool