Ambulance Graph
Implementation of an RL environment in a discrete graph space.
An ambulance environment over a simple graph. An agent interacts through the environment by selecting locations for various ambulances over the graph. Afterwards a patient arrives and the ambulance most go and serve the arrival, paying a cost to travel.
- class or_suite.envs.ambulance.ambulance_graph.AmbulanceGraphEnvironment(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'edges': [(0, 4, {'travel_time': 7}), (0, 1, {'travel_time': 1}), (1, 2, {'travel_time': 3}), (2, 3, {'travel_time': 5}), (1, 3, {'travel_time': 1}), (1, 4, {'travel_time': 17}), (3, 4, {'travel_time': 3})], 'epLen': 5, 'from_data': False, 'num_ambulance': 2, 'starting_state': [1, 2]})[source]
A graph of nodes V with edges between the nodes E; each node represents a location where an ambulance could be stationed or a call could come in. The edges between nodes are undirected and have a weight representing the distance between those two nodes. The nearest ambulance to a call is determined by computing the shortest path from each ambulance to the call, and choosing the ambulance with the minimum length path. The calls arrive according to a prespecified iid probability distribution that can change over time.
- epLen
The int number of time steps to run the experiment for.
- arrival_dist
A lambda arrival distribution for calls over the observation space; takes an integer (step) and returns an integer that corresponds to a node in the observation space.
- alpha
A float controlling proportional difference in cost to move between calls and to respond to a call.
- from_data
A bool indicator for whether the arrivals will be read from data or randomly generated.
- arrival_data
An int list only used if from_data is True, this is a list of arrivals, where each arrival corresponds to a node in the observation space.
- episode_num
The (int) current episode number, increments every time the environment is reset.
- graph
A networkx Graph representing the observation space.
- num_nodes
The (int) number of nodes in the graph.
- state
An int list representing the current state of the environment.
- timestep
The (int) timestep the current episode is on.
- lengths
A symmetric float matrix containing the distance between each pair of nodes.
- starting_state
An int list containing the starting locations for each ambulance.
- num_ambulance
The (int) number of ambulances in the environment.
- action_space
(Gym.spaces MultiDiscrete) Actions must be the length of the number of ambulances, every entry is an int corresponding to a node in the graph.
- observation_space
(Gym.spaces MultiDiscrete) The environment state must be the length of the number of ambulances, every entry is an int corresponding to a node in the graph.
- __init__(config={'alpha': 0.25, 'arrival_dist': <function <lambda>>, 'edges': [(0, 4, {'travel_time': 7}), (0, 1, {'travel_time': 1}), (1, 2, {'travel_time': 3}), (2, 3, {'travel_time': 5}), (1, 3, {'travel_time': 1}), (1, 4, {'travel_time': 17}), (3, 4, {'travel_time': 3})], 'epLen': 5, 'from_data': False, 'num_ambulance': 2, 'starting_state': [1, 2]})[source]
- Parameters
config – A dictionary (dict) containing the parameters required to set up a metric ambulance environment.
epLen – The (int) number of time steps to run the experiment for.
arrival_dist – A (lambda) arrival distribution for calls over the observation space; takes an integer (step) and returns an integer that corresponds to a node in the observation space.
alpha – A float controlling proportional difference in cost to move between calls and to respond to a call.
from_data – A bool indicator for whether the arrivals will be read from data or randomly generated.
data – An int list only needed if from_data is True, this is a list of arrivals, where each arrival corresponds to a node in the observation space.
edges – A tuple list where each tuple corresponds to an edge in the graph. The tuples are of the form (int1, int2, {‘travel_time’: int3}). int1 and int2 are the two endpoints of the edge, and int3 is the time it takes to travel from one endpoint to the other.
starting_state – An int list containing the starting locations for each ambulance.
num_ambulance – The (int) number of ambulances in the environment.
- close()[source]
Override close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when garbage collected or when the program exits.
- find_lengths(graph, num_nodes)[source]
Given a graph, find_lengths first calculates the pairwise shortest distance between all the nodes, which is stored in a (symmetric) matrix.
- render(mode='console')[source]
Renders the environment.
The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Note
- Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
- Parameters
mode (str) – the mode to render with
Example:
- class MyEnv(Env):
metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}
- def render(self, mode=’human’):
- if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
- elif mode == ‘human’:
… # pop up a window and render
- else:
super(MyEnv, self).render(mode=mode) # just raise an exception
- step(action)[source]
Move one step in the environment.
- Parameters
action – An int list of nodes the same length as the number of ambulances, where each entry i in the list corresponds to the chosen location for ambulance i.
- Returns
reward: A float representing the reward based on the action chosen.
newState: An int list representing the state of the environment after the action and call arrival.
done: A bool flag indicating the end of the episode.
- Return type
float, int, bool