Code

“Deep Variational Reinforcement Learning for POMDPs” – Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, Shimon Whiteson

Code: https://github.com/maximilianigl/DVRL

 

Paper: “DiCE: The Infinitely Differentiable Monte-Carlo Estimator” – Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson

Code: https://github.com/alshedivat/lola

Tensorflow Example: https://goo.gl/xkkGxN (https://drive.google.com/drive/folders/1qjuLTdRbM5CoyNGEyaCJdFKJ9UEwhU28)

Available in Pyro (https://github.com/uber/pyro)

Available in Tensorflow: https://github.com/tensorflow/probability/blob/master/tensorflow_probability/python/monte_carlo.py

Paper: “Learning with Opponent-Learning Awareness” – Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch

Code: https://github.com/alshedivat/lola

 

Paper: “TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning” – Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson [https://arxiv.org/abs/1710.11417]

Code: https://github.com/oxwhirl/treeqn

 

Paper: “Learning to Communicate with Deep Multi-Agent Reinforcement Learning” – Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

Code: https://github.com/iassael/learning-to-communicate

 

Community reimplementations (untested by WhiRL)

 

Paper: “QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning” – Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

Code: https://github.com/activatedgeek/qmix

Paper: “DiCE: The Infinitely Differentiable Monte-Carlo Estimator” – Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson

Code: https://github.com/alexis-jacq/LOLA_DiCE