Abstract
This paper explores the application of multi-agent reinforcement learning using the Proximal Policy Optimization (PPO) algorithm for resolving deadlocks in material flow systems with Automated Guided Vehicles (AGVs). A multi-agent strategy that optimizes the dynamics and interactions of multiple AGVs in real-time is implemented. The integration of the Population Based Training (PBT) algorithm from Ray enables continuous adaptation and improvement of learning processes. Subsequent modifications to the reward system have also been implemented to enhance the model's efficiency and effectiveness. The efficacy of the proposed approach is evaluated using a material flow simulation for a real industrial use case. The results demonstrate significant improvements in reducing collisions and increasing throughput within the system. This study highlights the potential of multi-agent reinforcement learning and specifically the PPO algorithm, to enhance the performance and efficiency of material flow systems with AGVs.