Cooperative AI

Perfectly Secure Steganography Using Minimum Entropy Coupling

We propose a perfectly-secure steganography algorithm for arbitrary covertext distributions.

Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier

Perfectly Secure Steganography Using Minimum Entropy Coupling

Amortized Rejection Sampling in Universal Probabilistic Programming

In this paper we develop a new and efficient amortized importance sampling estimator for rejection sampling.

Frank Wood Saeid Naderiparizi, Adam Ścibior, Andreas Munk, Mehrdad Ghadiri, Atılım Güneş Baydin, Bradley Gram-Hansen, Christian Schroeder de Witt, Robert Zinkov, Philip HS Torr, Tom Rainforth, Yee Whye Teh

Amortized Rejection Sampling in Universal Probabilistic Programming

Communicating via Markov Decision Processes

We propose a perfectly-secure steganography algorithm for arbitrary covertext distributions.

Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa M Zintgraf, Philip Torr, Martin Strohmeier, Zico Kolter, Shimon Whiteson, Jakob Foerster

Communicating via Markov Decision Processes

Generalized Beliefs for Cooperative AI

We propose a belief learning paradigm that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time.

Christopher Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

Generalized Beliefs for Cooperative AI

Mirror learning: A unifying framework of policy optimisation

We introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO.

Jakub Grudzien, Christian Schroeder de Witt, Jakob Foerster

Mirror learning: A unifying framework of policy optimisation

Model-Free Opponent Shaping

We propose Model-Free Opponent Shaping (M-FOS) to address the issue of shaping the learning process of their opponents without require long-horizon gradients.

Christopher Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

Model-Free Opponent Shaping

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

We propose QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob N Foerster, Shimon Whiteson

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

Our approach, Randomized Entity-wise Factorization for Imagined Learning (REFIL) answers the question “expected utility of each agent when only considering a randomly selected sub-group of its observed entities?”.

Shariq Iqbal, Christian Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

We propose a novel framework and algorithm for cheap talk discovery (CTD) and cheap talk utilization (CTU).

Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

Discovered Policy Optimisation

In this paper we explore the Mirror Learning space by meta-learning a “drift” function.

Chris Lu, Jakub Kuba, Alistair Letcher, Luke Metz, Christian Schroeder de Witt, Jakob Foerster

Discovered Policy Optimisation