Ambulance Graph

Implementation of an RL environment in a discrete graph space.

An ambulance environment over a simple graph. An agent interacts through the environment by selecting locations for various ambulances over the graph. Afterwards a patient arrives and the ambulance most go and serve the arrival, paying a cost to travel.

class or_suite.envs.ambulance.ambulance_graph.AmbulanceGraphEnvironment(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'edges': [(0, 4, {'travel_time': 7}), (0, 1, {'travel_time': 1}), (1, 2, {'travel_time': 3}), (2, 3, {'travel_time': 5}), (1, 3, {'travel_time': 1}), (1, 4, {'travel_time': 17}), (3, 4, {'travel_time': 3})], 'epLen': 5, 'from_data': False, 'num_ambulance': 2, 'starting_state': [1, 2]})[source]

A graph of nodes V with edges between the nodes E; each node represents a location where an ambulance could be stationed or a call could come in. The edges between nodes are undirected and have a weight representing the distance between those two nodes. The nearest ambulance to a call is determined by computing the shortest path from each ambulance to the call, and choosing the ambulance with the minimum length path. The calls arrive according to a prespecified iid probability distribution that can change over time.

epLen: The int number of time steps to run the experiment for.

arrival_dist: A lambda arrival distribution for calls over the observation space; takes an integer (step) and returns an integer that corresponds to a node in the observation space.

alpha: A float controlling proportional difference in cost to move between calls and to respond to a call.

from_data: A bool indicator for whether the arrivals will be read from data or randomly generated.

arrival_data: An int list only used if from_data is True, this is a list of arrivals, where each arrival corresponds to a node in the observation space.

episode_num: The (int) current episode number, increments every time the environment is reset.

graph: A networkx Graph representing the observation space.

num_nodes: The (int) number of nodes in the graph.

state: An int list representing the current state of the environment.

timestep: The (int) timestep the current episode is on.

lengths: A symmetric float matrix containing the distance between each pair of nodes.

starting_state: An int list containing the starting locations for each ambulance.

num_ambulance: The (int) number of ambulances in the environment.

action_space: (Gym.spaces MultiDiscrete) Actions must be the length of the number of ambulances, every entry is an int corresponding to a node in the graph.

observation_space: (Gym.spaces MultiDiscrete) The environment state must be the length of the number of ambulances, every entry is an int corresponding to a node in the graph.

__init__(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'edges': [(0, 4, {'travel_time': 7}), (0, 1, {'travel_time': 1}), (1, 2, {'travel_time': 3}), (2, 3, {'travel_time': 5}), (1, 3, {'travel_time': 1}), (1, 4, {'travel_time': 17}), (3, 4, {'travel_time': 3})], 'epLen': 5, 'from_data': False, 'num_ambulance': 2, 'starting_state': [1, 2]})[source]

Parameters

config – A dictionary (dict) containing the parameters required to set up a metric ambulance environment.
epLen – The (int) number of time steps to run the experiment for.
arrival_dist – A (lambda) arrival distribution for calls over the observation space; takes an integer (step) and returns an integer that corresponds to a node in the observation space.
alpha – A float controlling proportional difference in cost to move between calls and to respond to a call.
from_data – A bool indicator for whether the arrivals will be read from data or randomly generated.
data – An int list only needed if from_data is True, this is a list of arrivals, where each arrival corresponds to a node in the observation space.
edges – A tuple list where each tuple corresponds to an edge in the graph. The tuples are of the form (int1, int2, {‘travel_time’: int3}). int1 and int2 are the two endpoints of the edge, and int3 is the time it takes to travel from one endpoint to the other.
starting_state – An int list containing the starting locations for each ambulance.
num_ambulance – The (int) number of ambulances in the environment.

close()[source]

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

find_lengths(graph, num_nodes)[source]: Given a graph, find_lengths first calculates the pairwise shortest distance between all the nodes, which is stored in a (symmetric) matrix.

render(mode='console')[source]

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes: the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Parameters: mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):

if mode == ‘rgb_array’:: return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:: … # pop up a window and render
else:: super(MyEnv, self).render(mode=mode) # just raise an exception

reset()[source]: Reinitializes variables and returns the starting state.

step(action)[source]

Move one step in the environment.

Parameters

action – An int list of nodes the same length as the number of ambulances, where each entry i in the list corresponds to the chosen location for ambulance i.

Returns

reward: A float representing the reward based on the action chosen.

newState: An int list representing the state of the environment after the action and call arrival.

done: A bool flag indicating the end of the episode.

Return type

float, int, bool