Production and inventory managers take decisions that are of crucial importance for the customer satisfaction and financial performance of their company. It is not always clear what the best decisions are, as the business processes are very complex. There exist a lot of theoretical models that can support decision makers, but they are typically built on assumptions that are not always applicable in practice. In this thesis, we aimed to find a solution approach that required less restrictive assumptions, such that it would work well in problems with scarce resources, uncertain and changing demand. These types of problems often suffer from the curse of dimensionality, due to a combinatorial explosion of the potential decisions that can be taken.
This thesis takes advantage of developments in a type of Artificial Intelligence: Deep Reinforcement Learning (DRL). To apply DRL in practice, we need to overcome the challenge of a combinatorial action space. We explored different DRL algorithms and proposed adaptations to handle the exploding action space.
In Chapter 2, we focused on the stochastic capacitated lot sizing problem, where multiple products need to be scheduled on a production resource with limited capacity, and the products encounter uncertain (but stable) demand. We used the Proximal Policy Optimization (PPO) algorithm with discrete actions to find solutions to this complex problem, showing its ability to outperform other solution methods. Additionally, we opened the black box of the DRL algorithm so the resulting decisions can be understood by decision makers. Despite these promising results, we also highlighted the challenge of scaling this approach to handle larger and more complex problems.
Chapter 3 focused on scaling the PPO algorithm to larger problem settings in a multi-echelon inventory optimization problem, where we explored various supply chain network structures. By employing continuous actions and mapping them to discrete replenishment orders, we demonstrated that we could solve larger problems than before. Unfortunately, this mapping approach did require some extra modeling assumptions to make sure that the decisions were feasible in reality.
To enable the study of DRL in problems with uncertain and changing (non-stationary) demand, we introduced a novel Demand Generation Process (DGP) in Chapter 4. Building on our findings from previous chapters, Chapter 5 revisited the capacitated lot sizing problem, this time incorporating non-stationary demand. We utilized the Deep Controlled Learning (DCL) algorithm and split the complex integrated decision-making process into a problem with sub-decisions. This improved the scalability of the algorithm and outperformed the previous approach from
Chapter 2.
Overall, our research has demonstrated the potential of DRL methods in solving production and inventory management problems, offering effective solutions to address the challenges of resource constraints and uncertain and changing demand. Our findings provide valuable insights for practitioners and researchers seeking innovative approaches to tackle such real-world problems.