Ambulance Metric
Implementation of a basic RL environment for continuous spaces. Includes three test problems which were used in generating the figures.
An ambulance environment over [0,1]. An agent interacts through the environment by picking a location to station the ambulance. Then a patient arrives and the ambulance most go and serve the arrival, paying a cost of travel.
- class or_suite.envs.ambulance.ambulance_metric.AmbulanceEnvironment(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'epLen': 5, 'norm': 1, 'num_ambulance': 1, 'starting_state': array([0.], dtype=float32)})[source]
A 1-dimensional reinforcement learning environment in the space X = [0, 1].
Ambulances are located anywhere in X = [0,1], and at the beginning of each iteration, the agent chooses where to station each ambulance (the action). A call arrives, and the nearest ambulance goes to the location of that call.
- epLen
The (int) number of time steps to run the experiment for.
- arrival_dist
A (lambda) arrival distribution for calls over the space [0,1]; takes an integer (step) and returns a float between 0 and 1.
- alpha
A float controlling proportional difference in cost to move between calls and to respond to a call.
- starting_state
A float list containing the starting locations for each ambulance.
- num_ambulance
The (int) number of ambulances in the environment.
- state
An int list representing the current state of the environment.
- timestep
The (int) timestep the current episode is on.
- viewer
The window (Pyglet window or None) where the environment rendering is being drawn.
- most_recent_action
(float list or None) The most recent action chosen by the agent (used to render the environment).
- action_space
(Gym.spaces Box) Actions must be the length of the number of ambulances, every entry is a float between 0 and 1.
- observation_space
(Gym.spaces Box) The environment state must be the length of the number of ambulances, every entry is a float between 0 and 1.
- __init__(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'epLen': 5, 'norm': 1, 'num_ambulance': 1, 'starting_state': array([0.], dtype=float32)})[source]
- Parameters
config – A (dict) dictionary containing the parameters required to set up a metric ambulance environment.
epLen – The (int) number of time steps to run the experiment for.
arrival_dist – A (lambda) arrival distribution for calls over the space [0,1]; takes an integer (step) and returns a float between 0 and 1.
alpha – A float controlling proportional difference in cost to move between calls and to respond to a call.
starting_state – A float list containing the starting locations for each ambulance.
num_ambulance – The (int) number of ambulances in the environment.
norm – The (int) norm used in the calculations.
- reset_current_step(text, line_x1, line_x2, line_y)[source]
Used to render a textbox saying the current timestep.
- step(action)[source]
Move one step in the environment.
- Parameters
action – A float list of locations in [0,1] the same length as the number of ambulances, where each entry i in the list corresponds to the chosen location for ambulance i.
- Returns
reward: A float representing the reward based on the action chosen.
newState: A float list representing the state of the environment after the action and call arrival.
done: A bool flag indicating the end of the episode.
- Return type
float, float list, bool