Agent based modelling for continuously varying supply chains
Abstract: Problem definition: Supply chains are constantly evolving networks. Reinforcement learning is increasingly proposed as a solution to provide optimal control of these networks. Academic/practical: However, learning in continuously varying environments remains a challenge in the reinforcement learning literature.Methodology: This paper therefore seeks to address whether agents can control varying supply chain problems, transferring learning between environments that require different strategies and avoiding catastrophic forgetting of tasks that have not been seen in a while. To evaluate this approach, two state-of-the-art Reinforcement Learning (RL) algorithms are compared: an actor-critic learner, Proximal Policy Optimisation(PPO), and a Recurrent Proximal Policy Optimisation (RPPO), PPO with a Long Short-Term Memory(LSTM) layer, which is showing popularity in online learning environments. Results: First these methods are compared on six sets of environments with varying degrees of stochasticity. The results show that more lean strategies adopted in Batch environments are different from those adopted in Stochastic environments with varying products. The methods are also compared on various continuous supply chain scenarios, where the PPO agents are shown to be able to adapt through continuous learning when the tasks are similar but show more volatile performance when changing between the extreme tasks. However, the RPPO, with an ability to remember histories, is able to overcome this to some extent and takes on a more realistic strategy. Managerial implications: Our results provide a new perspective on the continuously varying supply chain, the cooperation and coordination of agents are crucial for improving the overall performance in uncertain and semi-continuous non-stationary supply chain environments without the need to retrain the environment as the demand changes.
- Bossens DM, Sobey AJ (2021) Lifetime policy reuse and the importance of task capacity. arXiv preprint arXiv:2106.01741 .
- Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780.
- Kara A, Dogan I (2018) Reinforcement learning approaches for specifying ordering policies of perishable inventory systems. Expert Systems with Applications 91:150–158.
- Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12.
- Kosasih EE, Brintrup A (2021) Reinforcement learning provides a flexible approach for realistic supply chain safety stock optimisation. arXiv preprint arXiv:2107.00913 .
- Kosasih EE, Brintrup A (2022) Reinforcement learning provides a flexible approach for realistic supply chain safety stock optimisation. IFAC-PapersOnLine 55(10):1539–1544.
- Paine J (2022) Behaviorally grounded model-based and model free cost reduction in a simulated multi-echelon supply chain. arXiv preprint arXiv:2202.12786 .
- Stranieri F, Stella F (2022) A deep reinforcement learning approach to supply chain inventory management. arXiv preprint arXiv:2204.09603 .
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.