Ambulance Graph

Implementation of an RL environment in a discrete graph space.

An ambulance environment over a simple graph. An agent interacts through the environment by selecting locations for various ambulances over the graph. Afterwards a patient arrives and the ambulance most go and serve the arrival, paying a cost to travel.

class or_suite.envs.ambulance.ambulance_graph.AmbulanceGraphEnvironment(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'edges': [(0, 4, {'travel_time': 7}), (0, 1, {'travel_time': 1}), (1, 2, {'travel_time': 3}), (2, 3, {'travel_time': 5}), (1, 3, {'travel_time': 1}), (1, 4, {'travel_time': 17}), (3, 4, {'travel_time': 3})], 'epLen': 5, 'from_data': False, 'num_ambulance': 2, 'starting_state': [1, 2]})[source]

A graph of nodes V with edges between the nodes E; each node represents a location where an ambulance could be stationed or a call could come in. The edges between nodes are undirected and have a weight representing the distance between those two nodes. The nearest ambulance to a call is determined by computing the shortest path from each ambulance to the call, and choosing the ambulance with the minimum length path. The calls arrive according to a prespecified iid probability distribution that can change over time.

epLen

The int number of time steps to run the experiment for.

arrival_dist

A lambda arrival distribution for calls over the observation space; takes an integer (step) and returns an integer that corresponds to a node in the observation space.

alpha

A float controlling proportional difference in cost to move between calls and to respond to a call.

from_data

A bool indicator for whether the arrivals will be read from data or randomly generated.

arrival_data

An int list only used if from_data is True, this is a list of arrivals, where each arrival corresponds to a node in the observation space.

episode_num

The (int) current episode number, increments every time the environment is reset.

graph

A networkx Graph representing the observation space.

num_nodes

The (int) number of nodes in the graph.

state

An int list representing the current state of the environment.

timestep

The (int) timestep the current episode is on.

lengths

A symmetric float matrix containing the distance between each pair of nodes.

starting_state

An int list containing the starting locations for each ambulance.

num_ambulance

The (int) number of ambulances in the environment.

action_space

(Gym.spaces MultiDiscrete) Actions must be the length of the number of ambulances, every entry is an int corresponding to a node in the graph.

observation_space

(Gym.spaces MultiDiscrete) The environment state must be the length of the number of ambulances, every entry is an int corresponding to a node in the graph.

__init__(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'edges': [(0, 4, {'travel_time': 7}), (0, 1, {'travel_time': 1}), (1, 2, {'travel_time': 3}), (2, 3, {'travel_time': 5}), (1, 3, {'travel_time': 1}), (1, 4, {'travel_time': 17}), (3, 4, {'travel_time': 3})], 'epLen': 5, 'from_data': False, 'num_ambulance': 2, 'starting_state': [1, 2]})[source]
Parameters
  • config – A dictionary (dict) containing the parameters required to set up a metric ambulance environment.

  • epLen – The (int) number of time steps to run the experiment for.

  • arrival_dist – A (lambda) arrival distribution for calls over the observation space; takes an integer (step) and returns an integer that corresponds to a node in the observation space.

  • alpha – A float controlling proportional difference in cost to move between calls and to respond to a call.

  • from_data – A bool indicator for whether the arrivals will be read from data or randomly generated.

  • data – An int list only needed if from_data is True, this is a list of arrivals, where each arrival corresponds to a node in the observation space.

  • edges – A tuple list where each tuple corresponds to an edge in the graph. The tuples are of the form (int1, int2, {‘travel_time’: int3}). int1 and int2 are the two endpoints of the edge, and int3 is the time it takes to travel from one endpoint to the other.

  • starting_state – An int list containing the starting locations for each ambulance.

  • num_ambulance – The (int) number of ambulances in the environment.

close()[source]

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

find_lengths(graph, num_nodes)[source]

Given a graph, find_lengths first calculates the pairwise shortest distance between all the nodes, which is stored in a (symmetric) matrix.

render(mode='console')[source]

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes

the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Parameters

mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:

return np.array(…) # return RGB frame suitable for video

elif mode == ‘human’:

… # pop up a window and render

else:

super(MyEnv, self).render(mode=mode) # just raise an exception

reset()[source]

Reinitializes variables and returns the starting state.

step(action)[source]

Move one step in the environment.

Parameters

action – An int list of nodes the same length as the number of ambulances, where each entry i in the list corresponds to the chosen location for ambulance i.

Returns

reward: A float representing the reward based on the action chosen.

newState: An int list representing the state of the environment after the action and call arrival.

done: A bool flag indicating the end of the episode.

Return type

float, int, bool