Research

Research Fields

Robots that have to operate in real world environments need to perform a huge variety of different skills at a high level of dexterity. Preprogramming these skills for unpredictable environments seems to be infeasible. We investigate computational learning algorithms that allow artificial agents to autonomously learn new skills from interaction with the environment, humans or other agents. We believe that such autonomously learning agents will have a great impact in many areas of everyday’s life, including service robotics that help in the household or with care of the elderly, manufacturing, agri-culture robotics or the disposal of dangerous material such as nuclear waste.

An autonomously learning agent has to acquire a rich set of different behaviors to achieve a variety of goals. The agent has to learn autonomously how to explore its environment and determine which are the important features that need to be considered for making a decision. It has to identify relevant behaviours and needs to determine when to learn new behaviours. Furthermore, the robot needs to learn what are relevant goals and how to re-use behaviours in order to achieve new goals. It needs to be easily teachable by human laymen and collaborate with them. Moreover, in many applications, several robotic agents need to be coordinated.

Our research concentrates on the following sub-fields of machine learning:

Selected Papers

  • Artificial Intelligence Journal: Model-based contextual policy search for data-efficient generalization of robot skills

    In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies.

    • A. Kupcsik, M. P. Deisenroth, J. Peters, A. P. Loh, P. Vadakkepat, and G. Neumann, “Model-based contextual policy search for data-efficient generalization of robot skills,” Artificial Intelligence, vol. 247, pp. 415-439, 2017.
      [BibTeX] [Abstract] [Download PDF]

      In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies.

      @article{lirolem25774,
      year = {2017},
      publisher = {Elsevier},
      title = {Model-based contextual policy search for data-efficient generalization of robot skills},
      pages = {415--439},
      author = {A. Kupcsik and M. P. Deisenroth and J. Peters and A. P. Loh and P. Vadakkepat and G. Neumann},
      volume = {247},
      month = {June},
      journal = {Artificial Intelligence},
      keywords = {ARRAY(0x55a2cdbdeb48)},
      url = {http://eprints.lincoln.ac.uk/25774/},
      abstract = {In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies.}
      }

  • AAAI2016: Model-free Preference-based Reinforcement Learning

    Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control problems with continuous state/action spaces require a known or estimated model, which is often not available and hard to learn. In this paper, we integrate preference-based estimation of the reward function into a model-free reinforcement learning (RL) algorithm, resulting in a model-free PBRL algorithm. Our new algorithm is based on Relative Entropy Policy Search (REPS), enabling us to utilize stochastic policies and to directly control the greediness of the policy update. REPS decreases exploration of the policy slowly by limiting the relative entropy of the policy update, which ensures that the algorithm is provided with a versatile set of trajectories, and consequently with informative preferences. The preference-based estimation is computed using a sample-based Bayesian method, which can also estimate the uncertainty of the utility. Additionally, we also compare to a linear solvable approximation, based on inverse RL. We show that both approaches perform favourably to the current state-of-the-art. The overall result is an algorithm that can learn non-parametric continuous action policies from a small number of preferences.

    • C. Wirth, J. Furnkranz, and G. Neumann, “Model-free preference-based reinforcement learning,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 2222-2228.
      [BibTeX] [Abstract] [Download PDF]

      Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control problems with continuous state/action spaces require a known or estimated model, which is often not available and hard to learn. In this paper, we integrate preference-based estimation of the reward function into a model-free reinforcement learning (RL) algorithm, resulting in a model-free PBRL algorithm. Our new algorithm is based on Relative Entropy Policy Search (REPS), enabling us to utilize stochastic policies and to directly control the greediness of the policy update. REPS decreases exploration of the policy slowly by limiting the relative entropy of the policy update, which ensures that the algorithm is provided with a versatile set of trajectories, and consequently with informative preferences. The preference-based estimation is computed using a sample-based Bayesian method, which can also estimate the uncertainty of the utility. Additionally, we also compare to a linear solvable approximation, based on inverse RL. We show that both approaches perform favourably to the current state-of-the-art. The overall result is an algorithm that can learn non-parametric continuous action policies from a small number of preferences.

      @inproceedings{lirolem25746,
      month = {February},
      title = {Model-free preference-based reinforcement learning},
      journal = {30th AAAI Conference on Artificial Intelligence, AAAI 2016},
      pages = {2222--2228},
      author = {C. Wirth and J. Furnkranz and G. Neumann},
      year = {2016},
      booktitle = {Thirtieth AAAI Conference on Artificial Intelligence},
      keywords = {ARRAY(0x55a2cea14c68)},
      abstract = {Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control problems with continuous state/action spaces require a known or estimated model, which is often not available and hard to learn. In this paper, we integrate preference-based estimation of the reward function into a model-free reinforcement learning (RL) algorithm, resulting in a model-free PBRL algorithm. Our new algorithm is based on Relative Entropy Policy Search (REPS), enabling us to utilize stochastic policies and to directly control the greediness of the policy update. REPS decreases exploration of the policy slowly by limiting the relative entropy of the policy update, which ensures that the algorithm is provided with a versatile set of trajectories, and consequently with informative preferences. The preference-based estimation is computed using a sample-based Bayesian method, which can also estimate the uncertainty of the utility. Additionally, we also compare to a linear solvable approximation, based on inverse RL. We show that both approaches perform favourably to the current state-of-the-art. The overall result is an algorithm that can learn non-parametric continuous action policies from a small number of preferences.},
      url = {http://eprints.lincoln.ac.uk/25746/}
      }

  • NIPS 2013: Probabilistic Movement Primitives

    Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios.

    • A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilistic movement primitives,” in Advances in Neural Information Processing Systems, (NIPS), 2013.
      [BibTeX] [Abstract] [Download PDF]

      Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios.

      @inproceedings{lirolem25785,
      title = {Probabilistic movement primitives},
      month = {December},
      journal = {Advances in Neural Information Processing Systems},
      author = {A. Paraschos and C. Daniel and J. Peters and G. Neumann},
      year = {2013},
      booktitle = {Advances in Neural Information Processing Systems, (NIPS)},
      abstract = {Movement Primitives (MP) are a well-established approach for representing modular
      and re-usable robot movement generators. Many state-of-the-art robot learning
      successes are based MPs, due to their compact representation of the inherently
      continuous and high dimensional robot movements. A major goal in robot learning
      is to combine multiple MPs as building blocks in a modular control architecture
      to solve complex tasks. To this effect, a MP representation has to allow for
      blending between motions, adapting to altered task variables, and co-activating
      multiple MPs in parallel. We present a probabilistic formulation of the MP concept
      that maintains a distribution over trajectories. Our probabilistic approach
      allows for the derivation of new operations which are essential for implementing
      all aforementioned properties in one framework. In order to use such a trajectory
      distribution for robot movement control, we analytically derive a stochastic feedback
      controller which reproduces the given trajectory distribution. We evaluate
      and compare our approach to existing methods on several simulated as well as
      real robot scenarios.},
      url = {http://eprints.lincoln.ac.uk/25785/},
      keywords = {ARRAY(0x55a2ce840400)}
      }

     

  • NIPS 2015: Model-Based Relative Entropy Stochastic Search

    Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function. As the quality of such a quadratic approximation is limited, we do not greedily exploit the learned models. The algorithm can be misled by an inaccurate optimum introduced by the surrogate. Instead, we use information theoretic constraints to bound the `distance’ between the new and old data distribution while maximizing the objective function. Additionally the new method is able to sustain the exploration of the search distribution to avoid premature convergence. We compare our method with state of art black-box optimization methods on standard uni-modal and multi-modal optimization functions, on simulated planar robot tasks and a complex robot ball throwing task.The proposed method considerably outperforms the existing approaches.

    • A. Abdolmaleki, R. Lioutikov, N. Lua, P. L. Reis, J. Peters, and G. Neumann, “Model-based relative entropy stochastic search,” in Advances in Neural Information Processing Systems (NIPS), 2016, pp. 153-154.
      [BibTeX] [Abstract] [Download PDF]

      Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function. As the quality of such a quadratic approximation is limited, we do not greedily exploit the learned models. The algorithm can be misled by an inaccurate optimum introduced by the surrogate. Instead, we use information theoretic constraints to bound the ?distance? between the new and old data distribution while maximizing the objective function. Additionally the new method is able to sustain the exploration of the search distribution to avoid premature convergence. We compare our method with state of art black-box optimization methods on standard uni-modal and multi-modal optimization functions, on simulated planar robot tasks and a complex robot ball throwing task. The proposed method considerably outperforms the existing approaches.

      @inproceedings{lirolem25741,
      title = {Model-based relative entropy stochastic search},
      journal = {GECCO 2016 Companion - Proceedings of the 2016 Genetic and Evolutionary Computation Conference},
      pages = {153--154},
      year = {2016},
      author = {A. Abdolmaleki and R. Lioutikov and N. Lua and L. Paulo Reis and J. Peters and G. Neumann},
      booktitle = {Advances in Neural Information Processing Systems (NIPS)},
      keywords = {ARRAY(0x55a2cd85d860)},
      abstract = {Stochastic search algorithms are general black-box optimizers. Due to their ease
      of use and their generality, they have recently also gained a lot of attention in operations
      research, machine learning and policy search. Yet, these algorithms require
      a lot of evaluations of the objective, scale poorly with the problem dimension, are
      affected by highly noisy objective functions and may converge prematurely. To
      alleviate these problems, we introduce a new surrogate-based stochastic search
      approach. We learn simple, quadratic surrogate models of the objective function.
      As the quality of such a quadratic approximation is limited, we do not greedily exploit
      the learned models. The algorithm can be misled by an inaccurate optimum
      introduced by the surrogate. Instead, we use information theoretic constraints to
      bound the ?distance? between the new and old data distribution while maximizing
      the objective function. Additionally the new method is able to sustain the exploration
      of the search distribution to avoid premature convergence. We compare our
      method with state of art black-box optimization methods on standard uni-modal
      and multi-modal optimization functions, on simulated planar robot tasks and a
      complex robot ball throwing task. The proposed method considerably outperforms
      the existing approaches.},
      url = {http://eprints.lincoln.ac.uk/25741/}
      }