Resource Allocation
Sequential Resource Allocation Problem for n locations with K commodities.
A ResourceAllocationEnvironment where agent iterates through locations and receives a reward of Nash Social Welfare based on the resources it allocates, conditioned that allocation is within budget
- class or_suite.envs.resource_allocation.resource_allocation.ResourceAllocationEnvironment(config={'K': 2, 'MAX_VAL': 1000, 'from_data': False, 'init_budget': <function <lambda>>, 'num_rounds': 10, 'type_dist': <function <lambda>>, 'utility_function': <function <lambda>>, 'weight_matrix': array([[1. , 2. ], [0.3, 9. ], [1. , 1. ]])})[source]
Custom Environment that follows gym interface.
This is a simple resource allocation environment modeling a fair online allocation
- Methods:
get_config() : Returns the config dictionary used to initialize the environment. reset() : Resets environment to original starting state and timestep to 0 step(action) : Takes in allocation as action subtracts from budget, calculates reward, and updates action space render(mode) : (UNIMPLEMENTED) Renders the environment in the mode passed in; ‘human’ is the only mode currently supported. close() : (UNIMPLEMENTED) Closes the window where the rendering is being drawn.
- weight_matrix
Weights predefining the commodity needs for each type, every row is a type vector.
- Type
list
- num_types
Number of types
- Type
int
- num_commodities
Number of commodities
- Type
int
- epLen
Number of locations (also the length of an episode).
- Type
int
- budget
Amount of each commodity the principal begins with.
- Type
int
- type_dist
Function determining the number of people of each type at a location.
- Type
lambda function
- utility_function
Utility function, given an allocation x and a type theta, u(x,theta) is how good the fit is.
- Type
lambda function
- starting_state
Tuple (represented as list concat) of initial budget and type distribution.
- Type
np.array
- timestep
Step that is executed in an episode of an iteration.
- Type
int
- action_space
(Gym.spaces Box) Action space represents the K x n allocation matrix.
- observation_space
(Gym.spaces Box) The first K entries to the observation space is remaining budget, with the remaining spaces filled by the number of each type at each location.
- __init__(config={'K': 2, 'MAX_VAL': 1000, 'from_data': False, 'init_budget': <function <lambda>>, 'num_rounds': 10, 'type_dist': <function <lambda>>, 'utility_function': <function <lambda>>, 'weight_matrix': array([[1. , 2. ], [0.3, 9. ], [1. , 1. ]])})[source]
Inits RideshareGraphEnvironment with the given configuration.
- Parameters
config – A dictionary containing the initial configuration of the resource allocation environment.
- close()[source]
Override close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when garbage collected or when the program exits.
- render(mode='console')[source]
Renders the environment.
The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Note
- Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
- Parameters
mode (str) – the mode to render with
Example:
- class MyEnv(Env):
metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}
- def render(self, mode=’human’):
- if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
- elif mode == ‘human’:
… # pop up a window and render
- else:
super(MyEnv, self).render(mode=mode) # just raise an exception
- step(action)[source]
Move one step in the environment.
- Parameters
action – A matrix; the chosen action (each row how much to allocate to prev location).
- Returns
reward (double) : the reward. newState (int): the new state. done (bool) : the flag for end of the episode. info (dict) : any additional information.
- Return type
double, int, 0/1, dict