Research in
WhiRL is focused on reinforcement learning, deep learning, and related areas. A few of the most prominent topics are listed below:
Research Highlights

Meta-Learning
Reinforcement Learning algorithms are typically trained from scratch, starting with a random behaviour policy. This approach however often requires millions of environment interactions before learning to perform seemingly simple tasks such as playing a game of PacMan. One way to induce prior knowledge and thus accelerate learning is via meta-learning, or learning to learn. We are developing algorithms that allow an agent to make use of knowledge and skills it has obtained in related tasks, to learn faster and quickly infer which task it should solve.

Deep Multi-Agent Reinforcement Learning
We are developing new algorithms that enable teams of cooperating agents to learn control policies for solving complex tasks, including techniques for learning to communicate and stabilising multi-agent experience replay.

Learning from Demonstration
We are developing inverse reinforcement learning methods that can learn from failed demonstrations and exploit sample-based motion planners. Learning from demonstration is used extensively in the TERESA Project.

Robust Reinforcement Learning
We are developing new reinforcement learning methods that are robust to significant rare events, i.e., events with low probability that nonetheless significantly affect expected performance. For example, some rare wind conditions may increase the risk of crashing a helicopter. Since crashes are so catastrophic, avoiding them is key to maximising expected performance, even though the wind conditions contributing to the crash occur only rarely. We have developed a method that uses Bayesian optimisation and quadrature to efficiently optimise policies in such settings. We have also developed an off-environment reinforcement learning method that enables policy gradient methods to work in such settings too.

Automatic Lip Reading
We introduce LipNet, an algorithm that enables end-to-end sentence-level lip reading that out performs human lip reading capabilities and previously known algorithms on the GRID corpus. See also the LipNet video.

Active Perception
We are developing decision-theoretic methods for helping perception systems, such as multi-camera tracking systems, to make efficient use of scarce resources such as computation and bandwidth. By exploiting submodularity, we can efficiently determine which subset of cameras to use, or which subset of pixel boxes in an image to process, so as to maximise metrics such as information gain and expected coverage. We have developed active perception methods with PAC guarantees and methods for active perception POMDPs.
See also a comprehensive list of our publications.