Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control


Deep multi-agent reinforcement learning (MARL) holds the promise of automating many real-world cooperative robotic manipulation and transportation tasks. Nevertheless, decentralised cooperative robotic control has received less attention from the deep reinforcement learning community, as compared to single-agent robotics and multiagent games with discrete actions. To address this gap, this paper introduces Multi-Agent Mujoco, an easily extensible multi-agent benchmark suite for robotic control in continuous action spaces. The benchmark tasks are diverse and admit easily configurable partially observable settings. Inspired by the success of single-agent continuous value-based algorithms in robotic control, we also introduce COMIX, a novel extension to a common discrete action multi-agent Q-learning algorithm. We show that COMIX significantly outperforms state-of-the-art MADDPG on a partially observable variant of a popular particle environment and matches or surpasses it on Multi-Agent Mujoco. Thanks to this new benchmark suite and method, we can now pose an interesting question":" what is the key to performance in such settings, the use of value-based methods instead of policy gradients, or the factorisation of the joint Q-function? To answer this question, we propose a second new method, FacMADDPG, which factors MADDPG’s critic. Experimental results on Multi-Agent Mujoco suggest that factorisation is the key to performance.

arXiv preprint