- Arxiv: New paper on “Deep Reinforcement Learning for Swarm Systems” plus videos and code
**Abstract**:Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.

- M. Hüttenrauch, A. Šošić, and G. Neumann, “Deep Reinforcement Learning for Swarm Systems,” , p. 26, 2018.

[BibTeX] [Download PDF]`@article{huettenrauch2018deep, author = {H{\"u}ttenrauch, Maximilian and \v{S}o\v{s}i\'{c}, Adrian and Neumann, Gerhard}, year = {2018}, month = {07}, pages = {26}, title = {Deep Reinforcement Learning for Swarm Systems}, url = {https://arxiv.org/abs/1807.06613}, }`

**Videos:**Enlarge to full-screen for best visibility.

Rendezvous:

Pursuit Evasion:

Pursuit Evasion with Multiple Evaders:

Learning Progress:

**Code:**The code base for our work can be found in the following git repository:

- M. Hüttenrauch, A. Šošić, and G. Neumann, “Deep Reinforcement Learning for Swarm Systems,” , p. 26, 2018.
- Arxiv: Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning
Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

- M. Hüttenrauch, A. Šošić, and G. Neumann, “Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning,” , p. 8, 2017.

[BibTeX] [Abstract] [Download PDF]

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

`@article{swarm1, author = {Hüttenrauch, Maximilian and Šošić, Adrian and Neumann, Gerhard}, year = {2017}, month = {09}, pages = {8}, title = {Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning}, url = {https://arxiv.org/abs/1709.07224}, abstract = {Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.} }`

- M. Hüttenrauch, A. Šošić, and G. Neumann, “Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning,” , p. 8, 2017.