Inventory Control with Lead Times and Multiple Suppliers

class or_suite.envs.inventory_control_multiple_suppliers.multiple_suppliers_env.DualSourcingEnvironment(config)[source]

An environment with a variable number of suppliers, each with their own lead time and cost.

lead_times

The array of ints representing the lead times of each supplier.

supplier_costs

The array of ints representing the costs of each supplier.

hold_cost

The int holding cost.

backorder_cost

The int backorder cost.

epLen

The int number of time steps to run the experiment for.

max_order

The maximum value (int) that can be ordered from each supplier.

max_inventory

The maximum value (int) that can be held in inventory.

timestep

The (int) timestep the current episode is on.

starting_state

An int list containing enough indices for the sum of all the lead times, plus an additional index for the initial on-hand inventory.

action_space

(Gym.spaces MultiDiscrete) Actions must be the length of the number of suppliers. Each entry is an int corresponding to the order size.

observation_space

(Gym.spaces MultiDiscrete) The environment state must be the length of the of the sum of all lead times plus one. Each entry corresponds to the order that will soon be placed to a supplier. The last index is the current on-hand inventory.

neg_inventory

A bool that says whether the on-hand inventory can be negative or not.

__init__(config)[source]
Parameters

config – A dictionary containt the following parameters required to set up the environment: lead_times: array of ints representing the lead times of each supplier supplier_costs: array of ints representing the costs of each supplier demand_dist: The random number sampled from the given distribution to be used to calculate the demand hold_cost: The int holding cost. backorder_cost: The int backorder cost. epLen: The episode length max_order: The maximum value (int) that can be ordered from each supplier max_inventory: The maximum value (int) that can be held in inventory starting_state: An int list containing enough indices for the sum of all the lead times, plus an additional index for the initial on-hand inventory. neg_inventory: A bool that says whether the on-hand inventory can be negative or not.

render(mode='human')[source]

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes

the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Parameters

mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:

return np.array(…) # return RGB frame suitable for video

elif mode == ‘human’:

… # pop up a window and render

else:

super(MyEnv, self).render(mode=mode) # just raise an exception

reset()[source]

Reinitializes variables and returns the starting state.

reward(state)[source]
Reward is calculated in three components:
  • First component corresponds to the cost for ordering amounts from each supplier

  • Second component corresponds to paying a holding cost for extra inventory after demand arrives

  • Third component corresponds to a back order cost for unmet demand

seed(seed=None)[source]

Sets the numpy seed to the given value

Parameters

seed – The int represeting the numpy seed.

step(action)[source]

Move one step in the environment.

Parameters

action – An int list of the amount to order from each supplier.

Returns

reward: A float representing the reward based on the action chosen.

newState: An int list representing the new state of the environment after the action.

done: A bool flag indicating the end of the episode.

info: A dictionary containing extra information about the step. This dictionary contains the int value of the demand during the previous step

Return type

float, int, bool, info