Papers
Topics
Authors
Recent
Search
2000 character limit reached

Agent based modelling for continuously varying supply chains

Published 24 Dec 2023 in eess.SY, cs.AI, cs.LG, and cs.SY | (2312.15502v1)

Abstract: Problem definition: Supply chains are constantly evolving networks. Reinforcement learning is increasingly proposed as a solution to provide optimal control of these networks. Academic/practical: However, learning in continuously varying environments remains a challenge in the reinforcement learning literature.Methodology: This paper therefore seeks to address whether agents can control varying supply chain problems, transferring learning between environments that require different strategies and avoiding catastrophic forgetting of tasks that have not been seen in a while. To evaluate this approach, two state-of-the-art Reinforcement Learning (RL) algorithms are compared: an actor-critic learner, Proximal Policy Optimisation(PPO), and a Recurrent Proximal Policy Optimisation (RPPO), PPO with a Long Short-Term Memory(LSTM) layer, which is showing popularity in online learning environments. Results: First these methods are compared on six sets of environments with varying degrees of stochasticity. The results show that more lean strategies adopted in Batch environments are different from those adopted in Stochastic environments with varying products. The methods are also compared on various continuous supply chain scenarios, where the PPO agents are shown to be able to adapt through continuous learning when the tasks are similar but show more volatile performance when changing between the extreme tasks. However, the RPPO, with an ability to remember histories, is able to overcome this to some extent and takes on a more realistic strategy. Managerial implications: Our results provide a new perspective on the continuously varying supply chain, the cooperation and coordination of agents are crucial for improving the overall performance in uncertain and semi-continuous non-stationary supply chain environments without the need to retrain the environment as the demand changes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (8)
  1. Bossens DM, Sobey AJ (2021) Lifetime policy reuse and the importance of task capacity. arXiv preprint arXiv:2106.01741 .
  2. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780.
  3. Kara A, Dogan I (2018) Reinforcement learning approaches for specifying ordering policies of perishable inventory systems. Expert Systems with Applications 91:150–158.
  4. Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12.
  5. Kosasih EE, Brintrup A (2021) Reinforcement learning provides a flexible approach for realistic supply chain safety stock optimisation. arXiv preprint arXiv:2107.00913 .
  6. Kosasih EE, Brintrup A (2022) Reinforcement learning provides a flexible approach for realistic supply chain safety stock optimisation. IFAC-PapersOnLine 55(10):1539–1544.
  7. Paine J (2022) Behaviorally grounded model-based and model free cost reduction in a simulated multi-echelon supply chain. arXiv preprint arXiv:2202.12786 .
  8. Stranieri F, Stella F (2022) A deep reinforcement learning approach to supply chain inventory management. arXiv preprint arXiv:2204.09603 .
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.