Research

Research Fields

Robots that have to operate in real world environments need to perform a huge variety of different skills at a high level of dexterity. Preprogramming these skills for unpredictable environments seems to be infeasible. We investigate computational learning algorithms that allow artificial agents to autonomously learn new skills from interaction with the environment, humans or other agents. We believe that such autonomously learning agents will have a great impact in many areas of everyday’s life, including service robotics that help in the household or with care of the elderly, manufacturing, agri-culture robotics or the disposal of dangerous material such as nuclear waste.

An autonomously learning agent has to acquire a rich set of different behaviors to achieve a variety of goals. The agent has to learn autonomously how to explore its environment and determine which are the important features that need to be considered for making a decision. It has to identify relevant behaviours and needs to determine when to learn new behaviours. Furthermore, the robot needs to learn what are relevant goals and how to re-use behaviours in order to achieve new goals. It needs to be easily teachable by human laymen and collaborate with them. Moreover, in many applications, several robotic agents need to be coordinated.

Our research concentrates on the following sub-fields of machine learning:

Selected Papers

  • Arxiv: New paper on “Deep Reinforcement Learning for Swarm Systems” plus videos and code

    Abstract:

    Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.

    • M. Hüttenrauch, A. Šošić, and G. Neumann, “Deep Reinforcement Learning for Swarm Systems,” , p. 26, 2018.
      [BibTeX] [Download PDF]
      @article{huettenrauch2018deep,
      author = {H{\"u}ttenrauch, Maximilian and \v{S}o\v{s}i\'{c}, Adrian and Neumann, Gerhard},
      year = {2018},
      month = {07},
      pages = {26},
      title = {Deep Reinforcement Learning for Swarm Systems},
      url = {https://arxiv.org/abs/1807.06613},
      }

    Videos:

    Enlarge to full-screen for best visibility.

    Rendezvous:

     

     

     

     

    Pursuit Evasion:

     

     

     

     

    Pursuit Evasion with Multiple Evaders:

     

    Learning Progress:

     

     

     

     

    Code:

    The code base for our work can be found in the following git repository:

    https://github.com/LCAS/deep_rl_for_swarms

  • Arxiv: Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning

    Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

     

    • M. Hüttenrauch, A. Šošić, and G. Neumann, “Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning,” , p. 8, 2017.
      [BibTeX] [Abstract] [Download PDF]

      Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

      @article{swarm1,
      author = {Hüttenrauch, Maximilian and Šošić, Adrian and Neumann, Gerhard},
      year = {2017},
      month = {09},
      pages = {8},
      title = {Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning},
      url = {https://arxiv.org/abs/1709.07224},
      abstract = {Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.}
      }

  • IROS 2017: Hybrid control trajectory optimization under uncertainty

    Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i.e. hybrid controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (DDP), circumvents this problem by incorporating discrete actions inside DDP: we first optimize continuous mixtures of discrete actions, and, subsequently force the mixtures into fully discrete actions. Moreover, we show how our approach can be extended to partially observable Markov decision processes (POMDPs) for trajectory planning under uncertainty. We validate the approach in a car driving problem where the robot has to switch discrete gears and in a box pushing application where the robot can switch the side of the box to push. The pose and the friction parameters of the pushed box are initially unknown and only indirectly observable.

    • J. Pajarinen, V. Kyrki, M. Koval, S. Srinivasa, J. Peters, and G. Neumann, “Hybrid control trajectory optimization under uncertainty,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i.e. hybrid controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (DDP), circumvents this problem by incorporating discrete actions inside DDP: we first optimize continuous mixtures of discrete actions, and, subsequently force the mixtures into fully discrete actions. Moreover, we show how our approach can be extended to partially observable Markov decision processes (POMDPs) for trajectory planning under uncertainty. We validate the approach in a car driving problem where the robot has to switch discrete gears and in a box pushing application where the robot can switch the side of the box to push. The pose and the friction parameters of the pushed box are initially unknown and only indirectly observable.

      @inproceedings{lirolem28257,
      month = {September},
      booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
      year = {2017},
      title = {Hybrid control trajectory optimization under uncertainty},
      author = {J. Pajarinen and V. Kyrki and M. Koval and S Srinivasa and J. Peters and G. Neumann},
      abstract = {Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i.e. hybrid controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (DDP), circumvents this problem by incorporating discrete actions inside DDP: we first optimize continuous mixtures of discrete actions, and, subsequently force the mixtures into fully discrete actions. Moreover, we show how our approach can be extended to partially observable Markov decision processes (POMDPs) for trajectory planning under uncertainty. We validate the approach in a car driving problem where the robot has to switch discrete gears and in a box pushing application where the robot can switch the side of the box to push. The pose and the friction parameters of the pushed box are initially unknown and only indirectly observable.},
      keywords = {ARRAY(0x55fe0a6cea08)},
      url = {http://eprints.lincoln.ac.uk/28257/}
      }

  • IJCAI 2017: Contextual CMA-ES

    Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i.e., if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMA-ES algorithm to the contextual setting and illustrate its performance on several contextual tasks. Our new algorithm, called contextual CMAES, leverages from contextual learning while it preserves all the features of standard CMA-ES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.

    • A. Abdolmaleki, B. Price, N. Lau, P. Reis, and G. Neumann, “Contextual CMA-ES,” in International Joint Conference on Artificial Intelligence (IJCAI), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i.e., if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMA-ES algorithm to the contextual setting and illustrate its performance on several contextual tasks. Our new algorithm, called contextual CMAES, leverages from contextual learning while it preserves all the features of standard CMA-ES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.

      @inproceedings{lirolem28141,
      author = {A. Abdolmaleki and B. Price and N. Lau and P. Reis and G. Neumann},
      title = {Contextual CMA-ES},
      booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)},
      year = {2017},
      month = {August},
      url = {http://eprints.lincoln.ac.uk/28141/},
      keywords = {ARRAY(0x55fe0a6cec60)},
      abstract = {Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i.e., if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms
      that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMA-ES algorithm to the contextual setting and illustrate its performance on several contextual
      tasks. Our new algorithm, called contextual CMAES, leverages from contextual learning while it preserves all the features of standard CMA-ES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.}
      }

     

     

  • JMLR 2017: Non-parametric Policy Search with Limited Information Loss.

    Learning complex control policies from non-linear and redundant sensory input is an important challenge for reinforcement learning algorithms. Non-parametric methods that approximate values functions or transition models can address this problem, by adapting to the complexity of the dataset. Yet, many current non-parametric approaches rely on unstable greedy maximization of approximate value functions, which might lead to poor convergence or oscillations in the policy update. A more robust policy update can be obtained by limiting the information loss between successive state-action distributions. In this paper, we develop a policy search algorithm with policy updates that are both robust and non-parametric. Our method can learn non-parametric control policies for infinite horizon continuous Markov decision processes with non-linear and redundant sensory representations.
    We investigate how we can use approximations of the kernel function to reduce the time requirements of the demanding non-parametric computations. In our experiments, we show the strong performance of the proposed method, and how it can be approximated efficiently. Finally, we show that our algorithm can learn a real-robot underpowered swing-up task directly from image data.

    • H. van Hoof, G. Neumann, and J. Peters, “Non-parametric policy search with limited information loss,” Journal of Machine Learning Research, vol. 18, iss. 73, pp. 1-46, 2018.
      [BibTeX] [Abstract] [Download PDF]

      Learning complex control policies from non-linear and redundant sensory input is an important challenge for reinforcement learning algorithms. Non-parametric methods that approximate values functions or transition models can address this problem, by adapting to the complexity of the dataset. Yet, many current non-parametric approaches rely on unstable greedy maximization of approximate value functions, which might lead to poor convergence or oscillations in the policy update. A more robust policy update can be obtained by limiting the information loss between successive state-action distributions. In this paper, we develop a policy search algorithm with policy updates that are both robust and non-parametric. Our method can learn non-parametric control policies for infinite horizon continuous Markov decision processes with non-linear and redundant sensory representations. We investigate how we can use approximations of the kernel function to reduce the time requirements of the demanding non-parametric computations. In our experiments, we show the strong performance of the proposed method, and how it can be approximated effi- ciently. Finally, we show that our algorithm can learn a real-robot underpowered swing-up task directly from image data.

      @article{lirolem28020,
      author = {Herke van Hoof and Gerhard Neumann and Jan Peters},
      title = {Non-parametric policy search with limited information loss},
      publisher = {Journal of Machine Learning Research},
      journal = {Journal of Machine Learning Research},
      pages = {1--46},
      year = {2018},
      volume = {18},
      month = {December},
      number = {73},
      abstract = {Learning complex control policies from non-linear and redundant sensory input is an important
      challenge for reinforcement learning algorithms. Non-parametric methods that
      approximate values functions or transition models can address this problem, by adapting
      to the complexity of the dataset. Yet, many current non-parametric approaches rely on
      unstable greedy maximization of approximate value functions, which might lead to poor
      convergence or oscillations in the policy update. A more robust policy update can be obtained
      by limiting the information loss between successive state-action distributions. In this
      paper, we develop a policy search algorithm with policy updates that are both robust and
      non-parametric. Our method can learn non-parametric control policies for infinite horizon
      continuous Markov decision processes with non-linear and redundant sensory representations.
      We investigate how we can use approximations of the kernel function to reduce the
      time requirements of the demanding non-parametric computations. In our experiments, we
      show the strong performance of the proposed method, and how it can be approximated effi-
      ciently. Finally, we show that our algorithm can learn a real-robot underpowered swing-up
      task directly from image data.},
      keywords = {ARRAY(0x55fe0ab7ad08)},
      url = {http://eprints.lincoln.ac.uk/28020/}
      }

  • IJJR 2017: Learning Movement Primitive Libraries through Probabilistic Segmentation

    Movement primitives are a well established approach for encoding and executing movements. While the primitives themselves have been extensively researched, the concept of movement primitive libraries has not received similar attention. Libraries of movement primitives represent the skill set of an agent. Primitives can be queried and sequenced in order to solve specific tasks. The goal of this work is to segment unlabeled demonstrations into a representative set of primitives. Our proposed method differs from current approaches by taking advantage of the often neglected, mutual dependencies between the segments contained in the demonstrations and the primitives to be encoded. By exploiting this mutual dependency, we show that we can improve both the segmentation and the movement primitive library. Based on probabilistic inference our novel approach segments the demonstrations while learning a probabilistic representation of movement primitives. We demonstrate our method on two real robot applications. First, the robot segments sequences of different letters into a library, explaining the observed trajectories. Second, the robot segments demonstrations of a chair assembly task into a movement primitive library. The library is subsequently used to assemble the chair in an order not present in the demonstrations.

    • R. Lioutikov, G. Neumann, G. Maeda, and J. Peters, “Learning movement primitive libraries through probabilistic segmentation,” International Journal of Robotics Research (IJRR), vol. 36, iss. 8, pp. 879-894, 2017.
      [BibTeX] [Abstract] [Download PDF]

      Movement primitives are a well established approach for encoding and executing movements. While the primitives themselves have been extensively researched, the concept of movement primitive libraries has not received similar attention. Libraries of movement primitives represent the skill set of an agent. Primitives can be queried and sequenced in order to solve specific tasks. The goal of this work is to segment unlabeled demonstrations into a representative set of primitives. Our proposed method differs from current approaches by taking advantage of the often neglected, mutual dependencies between the segments contained in the demonstrations and the primitives to be encoded. By exploiting this mutual dependency, we show that we can improve both the segmentation and the movement primitive library. Based on probabilistic inference our novel approach segments the demonstrations while learning a probabilistic representation of movement primitives. We demonstrate our method on two real robot applications. First, the robot segments sequences of different letters into a library, explaining the observed trajectories. Second, the robot segments demonstrations of a chair assembly task into a movement primitive library. The library is subsequently used to assemble the chair in an order not present in the demonstrations.

      @article{lirolem28021,
      author = {Rudolf Lioutikov and Gerhard Neumann and Guilherme Maeda and Jan Peters},
      title = {Learning movement primitive libraries through probabilistic segmentation},
      publisher = {SAGE},
      journal = {International Journal of Robotics Research (IJRR)},
      pages = {879--894},
      year = {2017},
      volume = {36},
      month = {July},
      number = {8},
      keywords = {ARRAY(0x55fe0a6ce840)},
      url = {http://eprints.lincoln.ac.uk/28021/},
      abstract = {Movement primitives are a well established approach for encoding and executing movements. While the primitives
      themselves have been extensively researched, the concept of movement primitive libraries has not received similar
      attention. Libraries of movement primitives represent the skill set of an agent. Primitives can be queried and sequenced
      in order to solve specific tasks. The goal of this work is to segment unlabeled demonstrations into a representative
      set of primitives. Our proposed method differs from current approaches by taking advantage of the often neglected,
      mutual dependencies between the segments contained in the demonstrations and the primitives to be encoded. By
      exploiting this mutual dependency, we show that we can improve both the segmentation and the movement primitive
      library. Based on probabilistic inference our novel approach segments the demonstrations while learning a probabilistic
      representation of movement primitives. We demonstrate our method on two real robot applications. First, the robot
      segments sequences of different letters into a library, explaining the observed trajectories. Second, the robot segments
      demonstrations of a chair assembly task into a movement primitive library. The library is subsequently used to assemble the chair in an order not present in the demonstrations.}
      }

  • ICML 2017: Local Bayesian Optimization

    Bayesian optimization is renowned for its sample efficiency but its application to higher dimensional tasks is impeded by its focus on global optimization. To scale to higher dimensional problems, we leverage the sample efficiency of Bayesian optimization in a local context. The optimization of the acquisition function is restricted to the vicinity of a Gaussian search distribution which is moved towards high value areas of the objective. The proposed informationtheoretic update of the search distribution results in a Bayesian interpretation of local stochastic search: the search distribution encodes prior knowledge on the optimum’s location and is weighted at each iteration by the likelihood of this location’s optimality. We demonstrate the effectiveness of our algorithm on several benchmark objective functions as well as a continuous robotic task in which an informative prior is obtained by imitation learning.

    • R. Akrour, D. Sorokin, J. Peters, and G. Neumann, “Local Bayesian optimization of motor skills,” in International Conference on Machine Learning (ICML), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Bayesian optimization is renowned for its sample efficiency but its application to higher dimensional tasks is impeded by its focus on global optimization. To scale to higher dimensional problems, we leverage the sample efficiency of Bayesian optimization in a local context. The optimization of the acquisition function is restricted to the vicinity of a Gaussian search distribution which is moved towards high value areas of the objective. The proposed informationtheoretic update of the search distribution results in a Bayesian interpretation of local stochastic search: the search distribution encodes prior knowledge on the optimum?s location and is weighted at each iteration by the likelihood of this location?s optimality. We demonstrate the effectiveness of our algorithm on several benchmark objective functions as well as a continuous robotic task in which an informative prior is obtained by imitation learning.

      @inproceedings{lirolem27902,
      author = {R. Akrour and D. Sorokin and J. Peters and G. Neumann},
      booktitle = {International Conference on Machine Learning (ICML)},
      year = {2017},
      title = {Local Bayesian optimization of motor skills},
      month = {August},
      abstract = {Bayesian optimization is renowned for its sample
      efficiency but its application to higher dimensional
      tasks is impeded by its focus on global
      optimization. To scale to higher dimensional
      problems, we leverage the sample efficiency of
      Bayesian optimization in a local context. The
      optimization of the acquisition function is restricted
      to the vicinity of a Gaussian search distribution
      which is moved towards high value areas
      of the objective. The proposed informationtheoretic
      update of the search distribution results
      in a Bayesian interpretation of local stochastic
      search: the search distribution encodes prior
      knowledge on the optimum?s location and is
      weighted at each iteration by the likelihood of
      this location?s optimality. We demonstrate the
      effectiveness of our algorithm on several benchmark
      objective functions as well as a continuous
      robotic task in which an informative prior is obtained
      by imitation learning.},
      url = {http://eprints.lincoln.ac.uk/27902/},
      keywords = {ARRAY(0x55fe0a3e62c8)}
      }

  • AURO2017: Using Probabilistic Movement Primitives in Robotics

    Movement Primitives are a well-established paradigm for modular movement representation and generation. They provide a data-driven representation of movements and support generalization to novel situations, temporal modulation, sequencing of primitives and controllers for executing the primitive on physical systems. However, while many MP frameworks exhibit some of these properties, there is a need for a uni-fied framework that implements all of them in a principled way. In this paper, we show that this goal can be achieved by using a probabilistic representation. Our approach models trajectory distributions learned from stochastic movements. Probabilistic operations, such as conditioning can be used to achieve generalization to novel situations or to combine and blend movements in a principled way. We derive a stochastic feedback controller that reproduces the encoded variability of the movement and the coupling of the degrees of freedom of the robot. We evaluate and compare our approach on several simulated and real robot scenarios.

    • A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Using probabilistic movement primitives in robotics,” Autonomous Robots, vol. 42, iss. 3, pp. 529-551, 2018.
      [BibTeX] [Abstract] [Download PDF]

      Movement Primitives are a well-established paradigm for modular movement representation and generation. They provide a data-driven representation of movements and support generalization to novel situations, temporal modulation, sequencing of primitives and controllers for executing the primitive on physical systems. However, while many MP frameworks exhibit some of these properties, there is a need for a uni- fied framework that implements all of them in a principled way. In this paper, we show that this goal can be achieved by using a probabilistic representation. Our approach models trajectory distributions learned from stochastic movements. Probabilistic operations, such as conditioning can be used to achieve generalization to novel situations or to combine and blend movements in a principled way. We derive a stochastic feedback controller that reproduces the encoded variability of the movement and the coupling of the degrees of freedom of the robot. We evaluate and compare our approach on several simulated and real robot scenarios.

      @article{lirolem27883,
      number = {3},
      month = {March},
      volume = {42},
      year = {2018},
      pages = {529--551},
      journal = {Autonomous Robots},
      publisher = {Springer Verlag},
      title = {Using probabilistic movement primitives in robotics},
      author = {Alexandros Paraschos and Christian Daniel and Jan Peters and Gerhard Neumann},
      url = {http://eprints.lincoln.ac.uk/27883/},
      keywords = {ARRAY(0x55fe0a3e6310)},
      abstract = {Movement Primitives are a well-established
      paradigm for modular movement representation and
      generation. They provide a data-driven representation
      of movements and support generalization to novel situations,
      temporal modulation, sequencing of primitives
      and controllers for executing the primitive on physical
      systems. However, while many MP frameworks exhibit
      some of these properties, there is a need for a uni-
      fied framework that implements all of them in a principled
      way. In this paper, we show that this goal can be
      achieved by using a probabilistic representation. Our
      approach models trajectory distributions learned from
      stochastic movements. Probabilistic operations, such as
      conditioning can be used to achieve generalization to
      novel situations or to combine and blend movements in
      a principled way. We derive a stochastic feedback controller
      that reproduces the encoded variability of the
      movement and the coupling of the degrees of freedom
      of the robot. We evaluate and compare our approach
      on several simulated and real robot scenarios.}
      }

    This paper is the extended journal version of

    • A. Paraschos, G. Neumann, and J. Peters, “A probabilistic approach to robot trajectory generation,” in 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2013, pp. 477-483.
      [BibTeX] [Abstract] [Download PDF]

      Motor Primitives (MPs) are a promising approach for the data-driven acquisition as well as for the modular and re-usable generation of movements. However, a modular control architecture with MPs is only effective if the MPs support co-activation as well as continuously blending the activation from one MP to the next. In addition, we need efficient mechanisms to adapt a MP to the current situation. Common approaches to movement primitives lack such capabilities or their implementation is based on heuristics. We present a probabilistic movement primitive approach that overcomes the limitations of existing approaches. We encode a primitive as a probability distribution over trajectories. The representation as distribution has several beneficial properties. It allows encoding a time-varying variance profile. Most importantly, it allows performing new operations – a product of distributions for the co-activation of MPs conditioning for generalizing the MP to different desired targets. We derive a feedback controller that reproduces a given trajectory distribution in closed form. We compare our approach to the existing state-of-the art and present real robot results for learning from demonstration.

      @inproceedings{lirolem25693,
      title = {A probabilistic approach to robot trajectory generation},
      author = {A. Paraschos and G. Neumann and J. Peters},
      publisher = {IEEE},
      volume = {2015-F},
      year = {2013},
      booktitle = {13th IEEE-RAS International Conference on Humanoid Robots (Humanoids)},
      pages = {477--483},
      number = {Februa},
      month = {October},
      url = {http://eprints.lincoln.ac.uk/25693/},
      keywords = {ARRAY(0x55fe0a7be598)},
      abstract = {Motor Primitives (MPs) are a promising approach for the data-driven acquisition as well as for the modular and re-usable generation of movements. However, a modular control architecture with MPs is only effective if the MPs support co-activation as well as continuously blending the activation from one MP to the next. In addition, we need efficient mechanisms to adapt a MP to the current situation. Common approaches to movement primitives lack such capabilities or their implementation is based on heuristics. We present a probabilistic movement primitive approach that overcomes the limitations of existing approaches. We encode a primitive as a probability distribution over trajectories. The representation as distribution has several beneficial properties. It allows encoding a time-varying variance profile. Most importantly, it allows performing new operations - a product of distributions for the co-activation of MPs conditioning for generalizing the MP to different desired targets. We derive a feedback controller that reproduces a given trajectory distribution in closed form. We compare our approach to the existing state-of-the art and present real robot results for learning from demonstration.}
      }

    • A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilistic movement primitives,” in Advances in Neural Information Processing Systems, (NIPS), 2013.
      [BibTeX] [Abstract] [Download PDF]

      Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios.

      @inproceedings{lirolem25785,
      journal = {Advances in Neural Information Processing Systems},
      month = {December},
      year = {2013},
      booktitle = {Advances in Neural Information Processing Systems, (NIPS)},
      title = {Probabilistic movement primitives},
      author = {A. Paraschos and C. Daniel and J. Peters and G. Neumann},
      abstract = {Movement Primitives (MP) are a well-established approach for representing modular
      and re-usable robot movement generators. Many state-of-the-art robot learning
      successes are based MPs, due to their compact representation of the inherently
      continuous and high dimensional robot movements. A major goal in robot learning
      is to combine multiple MPs as building blocks in a modular control architecture
      to solve complex tasks. To this effect, a MP representation has to allow for
      blending between motions, adapting to altered task variables, and co-activating
      multiple MPs in parallel. We present a probabilistic formulation of the MP concept
      that maintains a distribution over trajectories. Our probabilistic approach
      allows for the derivation of new operations which are essential for implementing
      all aforementioned properties in one framework. In order to use such a trajectory
      distribution for robot movement control, we analytically derive a stochastic feedback
      controller which reproduces the given trajectory distribution. We evaluate
      and compare our approach to existing methods on several simulated as well as
      real robot scenarios.},
      url = {http://eprints.lincoln.ac.uk/25785/},
      keywords = {ARRAY(0x55fe0ab6feb8)}
      }

     

  • GECCO 2017: Deriving and Improving CMA-ES with Information-Geometric Trustregions

    CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without the need of extensive parameter tuning. The algorithm has many beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates the current samples as well as the evolution path and its invariance properties. Its update rules are composed of well established heuristics where the theoretical foundations of some of these rules are also well understood. In this paper we will fully derive all CMA-ES update rules within the framework of expectation-maximisation-based stochastic search algorithms using information-geometric trust regions. We show that the use of the trust region results in similar updates to CMA-ES for the mean and the covariance matrix while it allows for the derivation of an improved update rule for the step-size. Our new algorithm, Trust-Region Covariance Matrix Adaptation Evolution Strategy (TR-CMA-ES) is fully derived from first order optimization principles and performs favourably in compare to standard CMA-ES algorithm.

    • A. Abdolmaleki, B. Price, N. Lau, L. P. Reis, and G. Neumann, “Deriving and improving CMA-ES with Information geometric trust regions,” in The Genetic and Evolutionary Computation Conference (GECCO 2017), 2017.
      [BibTeX] [Abstract] [Download PDF]

      CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without the need of extensive parameter tuning. The algorithm has many beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates the current samples as well as the evolution path and its invariance properties. Its update rules are composed of well established heuristics where the theoretical foundations of some of these rules are also well understood. In this paper we will fully derive all CMA-ES update rules within the framework of expectation-maximisation-based stochastic search algorithms using information-geometric trust regions. We show that the use of the trust region results in similar updates to CMA-ES for the mean and the covariance matrix while it allows for the derivation of an improved update rule for the step-size. Our new algorithm, Trust-Region Covariance Matrix Adaptation Evolution Strategy (TR-CMA-ES) is fully derived from first order optimization principles and performs favourably in compare to standard CMA-ES algorithm.

      @inproceedings{lirolem27056,
      author = {Abbas Abdolmaleki and Bob Price and Nuno Lau and Luis Paulo Reis and Gerhard Neumann},
      booktitle = {The Genetic and Evolutionary Computation Conference (GECCO 2017)},
      year = {2017},
      title = {Deriving and improving CMA-ES with Information geometric trust regions},
      month = {July},
      url = {http://eprints.lincoln.ac.uk/27056/},
      keywords = {ARRAY(0x55fe0a4fee10)},
      abstract = {CMA-ES is one of the most popular stochastic search algorithms.
      It performs favourably in many tasks without the need of extensive
      parameter tuning. The algorithm has many beneficial properties,
      including automatic step-size adaptation, efficient covariance updates
      that incorporates the current samples as well as the evolution
      path and its invariance properties. Its update rules are composed
      of well established heuristics where the theoretical foundations of
      some of these rules are also well understood. In this paper we
      will fully derive all CMA-ES update rules within the framework of
      expectation-maximisation-based stochastic search algorithms using
      information-geometric trust regions. We show that the use of the trust
      region results in similar updates to CMA-ES for the mean and the
      covariance matrix while it allows for the derivation of an improved
      update rule for the step-size. Our new algorithm, Trust-Region Covariance
      Matrix Adaptation Evolution Strategy (TR-CMA-ES) is
      fully derived from first order optimization principles and performs
      favourably in compare to standard CMA-ES algorithm.}
      }

  • ICAPS 2017: State-regularized policy search for linearized dynamical systems

    Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of feedbackcontrollers by taking advantage of local approximations of model dynamics and cost functions. Stability of the policy update is a major issue for these methods, rendering them hard to apply for highly nonlinear systems. Recent approaches combine classical Stochastic Optimal Control methods with information-theoretic bounds to control the step-size of the policy update and could even be used to train nonlinear deep control policies. These methods bound the relative entropy between the new and the old policy to ensure a stable policy update. However, despite the bound in policy space, the state distributions of two consecutive policies can still differ significantly, rendering the used local approximate models invalid. To alleviate this issue we propose enforcing a relative entropy constraint not only on the policy update, but also on the update of the state distribution, around which the dynamics and cost are being approximated. We present a derivation of the closed-form policy update and show that our approach outperforms related methods on two nonlinear and highly dynamic simulated systems.

    • H. Abdulsamad, O. Arenz, J. Peters, and G. Neumann, “State-regularized policy search for linearized dynamical systems,” in Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of feedbackcontrollers by taking advantage of local approximations of model dynamics and cost functions. Stability of the policy update is a major issue for these methods, rendering them hard to apply for highly nonlinear systems. Recent approaches combine classical Stochastic Optimal Control methods with information-theoretic bounds to control the step-size of the policy update and could even be used to train nonlinear deep control policies. These methods bound the relative entropy between the new and the old policy to ensure a stable policy update. However, despite the bound in policy space, the state distributions of two consecutive policies can still differ significantly, rendering the used local approximate models invalid. To alleviate this issue we propose enforcing a relative entropy constraint not only on the policy update, but also on the update of the state distribution, around which the dynamics and cost are being approximated. We present a derivation of the closed-form policy update and show that our approach outperforms related methods on two nonlinear and highly dynamic simulated systems.

      @inproceedings{lirolem27055,
      month = {June},
      author = {Hany Abdulsamad and Oleg Arenz and Jan Peters and Gerhard Neumann},
      title = {State-regularized policy search for linearized dynamical systems},
      booktitle = {Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS)},
      year = {2017},
      abstract = {Trajectory-Centric Reinforcement Learning and Trajectory
      Optimization methods optimize a sequence of feedbackcontrollers
      by taking advantage of local approximations of
      model dynamics and cost functions. Stability of the policy update
      is a major issue for these methods, rendering them hard
      to apply for highly nonlinear systems. Recent approaches
      combine classical Stochastic Optimal Control methods with
      information-theoretic bounds to control the step-size of the
      policy update and could even be used to train nonlinear deep
      control policies. These methods bound the relative entropy
      between the new and the old policy to ensure a stable policy
      update. However, despite the bound in policy space, the
      state distributions of two consecutive policies can still differ
      significantly, rendering the used local approximate models invalid.
      To alleviate this issue we propose enforcing a relative
      entropy constraint not only on the policy update, but also on
      the update of the state distribution, around which the dynamics
      and cost are being approximated. We present a derivation
      of the closed-form policy update and show that our approach
      outperforms related methods on two nonlinear and highly dynamic
      simulated systems.},
      keywords = {ARRAY(0x55fe0ab5ba58)},
      url = {http://eprints.lincoln.ac.uk/27055/}
      }