Thesis defense of Nathalie Vanvuchelen, 1 September 2022 (KU Leuven)

On Thursday, 1 September 2022, Nathalie Vanvuchelen will defend her PhD thesis Deep reinforcement learning for inventory control. This thesis has been supervised by Robert Boute. The ceremony will take place at KU Leuven, in Agora learning centre, room Emma Vorlat, at 16:30.


In recent years, excitement has been building about new technologies such as machine learning and artificial intelligence. Facilitated by the introduction of neural networks, the increased availability of data and major advances in computational power, several breakthroughs in machine learning have been achieved. Deep reinforcement learning (DRL), a subbranch of machine learning that focuses on sequential decision-making problems, has been coined as a general-purpose technology that can solve a variety of problems without requiring extensive domain knowledge or making restrictive assumptions. After successful applications in gaming and robotics, we explore how inventory control may benefit from DRL as the solution to many inventory control problems remains unknown despite decades of research. Training and constructing a DRL algorithm requires significant effort, we therefore develop a roadmap that sheds light on the key design choices of DRL algorithms. We also formulate several research directions that we believe may elevate the application of DRL in inventory control. We next investigate if DRL can develop well-performing policies for a two-item capacitated joint replenishment problem. We find that the policies developed approach the optimal policy, if tractable, and outperform state-of-the-art heuristics in case the items are dissimilar in terms of costs and demand. To date, most applications of DRL are, however, limited to problem settings with a small amount of action possibilities. To stimulate application of DRL in operations, it is crucial that DRL algorithms are also capable of learning in environments with a large number of actions, e.g., multiitem or multi-location problems where several decisions need to be made simultaneously. We therefore propose a new neural network architecture that effectively deals with large action spaces. Finally, we validate this approach on a transshipment problem in a real-life Zambian supply chain for malaria medicines. Reflecting on the research, we find that DRL is especially useful in settings where tailored solutions are lacking.