Videos

  • Arxiv: Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning

    Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

     

    • M. Hüttenrauch, A. Šošić, and G. Neumann, “Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning,” , p. 8, 2017.
      [BibTeX] [Abstract] [Download PDF]

      Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

      @article{swarm1,
      author = {Hüttenrauch, Maximilian and Šošić, Adrian and Neumann, Gerhard},
      year = {2017},
      month = {09},
      pages = {8},
      title = {Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning},
      url = {https://arxiv.org/abs/1709.07224},
      abstract = {Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.}
      }

  • RAL 2017: Guiding trajectory optimization by demonstrated distributions

    Trajectory optimization is an essential tool for motion planning under multiple constraints of robotic manipulators. Optimization-based methods can explicitly optimize a trajectory by leveraging prior knowledge of the system and have been used in various applications such as collision avoidance. However, these methods often require a hand-coded cost function in order to achieve the desired behavior. Specifying such cost function for a complex desired behavior, e.g., disentangling a rope, is a nontrivial task that is often even infeasible. Learning from demonstration (LfD) methods offer an alternative way to program robot motion. LfD methods are less dependent on analytical models and instead learn the behavior of experts implicitly from the demonstrated trajectories. However, the problem of adapting the demonstrations to new situations, e.g., avoiding newly introduced obstacles, has not been fully investigated in the literature. In this paper, we present a motion planning framework that combines the advantages of optimization-based and demonstration-based methods. We learn a distribution of trajectories demonstrated by human experts and use it to guide the trajectory optimization
    process. The resulting trajectory maintains the demonstrated behaviors, which are essential to performing the task successfully, while adapting the trajectory to avoid obstacles. In simulated experiments and with a real robotic system, we verify that our approach optimizes the trajectory to avoid obstacles and encodes the demonstrated behavior in the resulting trajectory

    • T. Osa, A. G. M. Esfahani, R. Stolkin, R. Lioutikov, J. Peters, and G. Neumann, “Guiding trajectory optimization by demonstrated distributions,” IEEE Robotics and Automation Letters (RA-L), vol. 2, iss. 2, pp. 819-826, 2017.
      [BibTeX] [Abstract] [Download PDF]

      Trajectory optimization is an essential tool for motion planning under multiple constraints of robotic manipulators. Optimization-based methods can explicitly optimize a trajectory by leveraging prior knowledge of the system and have been used in various applications such as collision avoidance. However, these methods often require a hand-coded cost function in order to achieve the desired behavior. Specifying such cost function for a complex desired behavior, e.g., disentangling a rope, is a nontrivial task that is often even infeasible. Learning from demonstration (LfD) methods offer an alternative way to program robot motion. LfD methods are less dependent on analytical models and instead learn the behavior of experts implicitly from the demonstrated trajectories. However, the problem of adapting the demonstrations to new situations, e.g., avoiding newly introduced obstacles, has not been fully investigated in the literature. In this paper, we present a motion planning framework that combines the advantages of optimization-based and demonstration-based methods. We learn a distribution of trajectories demonstrated by human experts and use it to guide the trajectory optimization process. The resulting trajectory maintains the demonstrated behaviors, which are essential to performing the task successfully, while adapting the trajectory to avoid obstacles. In simulated experiments and with a real robotic system, we verify that our approach optimizes the trajectory to avoid obstacles and encodes the demonstrated behavior in the resulting trajectory

      @article{lirolem26731,
      volume = {2},
      publisher = {IEEE},
      journal = {IEEE Robotics and Automation Letters (RA-L)},
      month = {January},
      pages = {819--826},
      number = {2},
      author = {Takayuki Osa and Amir M. Ghalamzan Esfahani and Rustam Stolkin and Rudolf Lioutikov and Jan Peters and Gerhard Neumann},
      title = {Guiding trajectory optimization by demonstrated distributions},
      year = {2017},
      url = {http://eprints.lincoln.ac.uk/26731/},
      abstract = {Trajectory optimization is an essential tool for motion
      planning under multiple constraints of robotic manipulators.
      Optimization-based methods can explicitly optimize a trajectory
      by leveraging prior knowledge of the system and have been used
      in various applications such as collision avoidance. However, these
      methods often require a hand-coded cost function in order to
      achieve the desired behavior. Specifying such cost function for
      a complex desired behavior, e.g., disentangling a rope, is a nontrivial
      task that is often even infeasible. Learning from demonstration
      (LfD) methods offer an alternative way to program robot
      motion. LfD methods are less dependent on analytical models
      and instead learn the behavior of experts implicitly from the
      demonstrated trajectories. However, the problem of adapting the
      demonstrations to new situations, e.g., avoiding newly introduced
      obstacles, has not been fully investigated in the literature. In this
      paper, we present a motion planning framework that combines
      the advantages of optimization-based and demonstration-based
      methods. We learn a distribution of trajectories demonstrated by
      human experts and use it to guide the trajectory optimization
      process. The resulting trajectory maintains the demonstrated
      behaviors, which are essential to performing the task successfully,
      while adapting the trajectory to avoid obstacles. In simulated
      experiments and with a real robotic system, we verify that our
      approach optimizes the trajectory to avoid obstacles and encodes
      the demonstrated behavior in the resulting trajectory},
      keywords = {ARRAY(0x56147fc52438)}
      }

    Video:

  • ISER 2016: Experiments with hierarchical reinforcement learning of multiple grasping policies

    Robotic grasping has attracted considerable interest, but it still remains a challenging task. The data-driven approach is a promising solution to the robotic grasping problem; this approach leverages a grasp dataset and generalizes grasps for various objects. However, these methods often depend on the quality of the given datasets, which are not trivial to obtain with sufficient quality. Although reinforcement learning approaches have been recently used to achieve autonomous collection of grasp datasets, the existing algorithms are often limited to specific grasp types. In this paper, we present a framework for hierarchical reinforcement learning of grasping policies. In our framework, the lowerlevel hierarchy learns multiple grasp types, and the upper-level hierarchy learns a policy to select from the learned grasp types according to a point cloud of a new object. Through experiments, we validate that our approach learns grasping by constructing the grasp dataset autonomously. The experimental results show that our approach learns multiple grasping policies and generalizes the learned grasps by using local point cloud information.

    • T. Osa, J. Peters, and G. Neumann, “Experiments with hierarchical reinforcement learning of multiple grasping policies,” in Proceedings of the International Symposium on Experimental Robotics (ISER), 2016.
      [BibTeX] [Abstract] [Download PDF]

      Robotic grasping has attracted considerable interest, but it still remains a challenging task. The data-driven approach is a promising solution to the robotic grasping problem; this approach leverages a grasp dataset and generalizes grasps for various objects. However, these methods often depend on the quality of the given datasets, which are not trivial to obtain with sufficient quality. Although reinforcement learning approaches have been recently used to achieve autonomous collection of grasp datasets, the existing algorithms are often limited to specific grasp types. In this paper, we present a framework for hierarchical reinforcement learning of grasping policies. In our framework, the lowerlevel hierarchy learns multiple grasp types, and the upper-level hierarchy learns a policy to select from the learned grasp types according to a point cloud of a new object. Through experiments, we validate that our approach learns grasping by constructing the grasp dataset autonomously. The experimental results show that our approach learns multiple grasping policies and generalizes the learned grasps by using local point cloud information.

      @inproceedings{lirolem26735,
      booktitle = {Proceedings of the International Symposium on Experimental Robotics (ISER)},
      month = {April},
      author = {T. Osa and J. Peters and G. Neumann},
      year = {2016},
      title = {Experiments with hierarchical reinforcement learning of multiple grasping policies},
      abstract = {Robotic grasping has attracted considerable interest, but it
      still remains a challenging task. The data-driven approach is a promising
      solution to the robotic grasping problem; this approach leverages a
      grasp dataset and generalizes grasps for various objects. However, these
      methods often depend on the quality of the given datasets, which are not
      trivial to obtain with sufficient quality. Although reinforcement learning
      approaches have been recently used to achieve autonomous collection
      of grasp datasets, the existing algorithms are often limited to specific
      grasp types. In this paper, we present a framework for hierarchical reinforcement
      learning of grasping policies. In our framework, the lowerlevel
      hierarchy learns multiple grasp types, and the upper-level hierarchy
      learns a policy to select from the learned grasp types according to a point
      cloud of a new object. Through experiments, we validate that our approach
      learns grasping by constructing the grasp dataset autonomously.
      The experimental results show that our approach learns multiple grasping
      policies and generalizes the learned grasps by using local point cloud
      information.},
      url = {http://eprints.lincoln.ac.uk/26735/},
      keywords = {ARRAY(0x56147fc525d0)}
      }

    Video: