Gerhard Neumann

Gerhard is a Professor of Robotics & Autonomous Systems in College of Science at the University of Lincoln and member of the Lincoln Center for Autonomous Systems (LCAS). Before coming to Lincoln, he has been an Assistant Professor at the TU Darmstadt from September 2014 to October 2016 and head of the Computational Learning for Autonomous Systems (CLAS) group. Before that, he was Post-Doc and Group Leader at the Intelligent Autonomous Systems Group (IAS) also in Darmstadt under the guidance of Prof. Jan Peters. Gerhard obtained his Ph.D. under the supervision of Prof. Wolfgang Mass at the Graz University of Technology.

Gerhard already authored 50+ peer reviewed papers, many of them in top ranked machine learning and robotics journals or conferences such as NIPS, ICML, ICRA, IROS, JMLR, Machine Learning and AURO. In Darmstadt, he is principle investigator of the EU H2020 project Romans and also already acquired DFG funding. He organized several workshops and is senior program committee for several conferences.

Key References:

  • A. Abdolmaleki, B. Price, N. Lau, L. P. Reis, and G. Neumann, “Deriving and improving CMA-ES with Information geometric trust regions,” in The Genetic and Evolutionary Computation Conference (GECCO 2017), 2017.
    [BibTeX] [Abstract] [Download PDF]

    CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without the need of extensive parameter tuning. The algorithm has many beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates the current samples as well as the evolution path and its invariance properties. Its update rules are composed of well established heuristics where the theoretical foundations of some of these rules are also well understood. In this paper we will fully derive all CMA-ES update rules within the framework of expectation-maximisation-based stochastic search algorithms using information-geometric trust regions. We show that the use of the trust region results in similar updates to CMA-ES for the mean and the covariance matrix while it allows for the derivation of an improved update rule for the step-size. Our new algorithm, Trust-Region Covariance Matrix Adaptation Evolution Strategy (TR-CMA-ES) is fully derived from first order optimization principles and performs favourably in compare to standard CMA-ES algorithm.

    @inproceedings{lirolem27056,
    author = {Abbas Abdolmaleki and Bob Price and Nuno Lau and Luis Paulo Reis and Gerhard Neumann},
    year = {2017},
    month = {July},
    title = {Deriving and improving CMA-ES with Information geometric trust regions},
    booktitle = {The Genetic and Evolutionary Computation Conference (GECCO 2017)},
    url = {http://eprints.lincoln.ac.uk/27056/},
    keywords = {ARRAY(0x55d320e2a628)},
    abstract = {CMA-ES is one of the most popular stochastic search algorithms.
    It performs favourably in many tasks without the need of extensive
    parameter tuning. The algorithm has many beneficial properties,
    including automatic step-size adaptation, efficient covariance updates
    that incorporates the current samples as well as the evolution
    path and its invariance properties. Its update rules are composed
    of well established heuristics where the theoretical foundations of
    some of these rules are also well understood. In this paper we
    will fully derive all CMA-ES update rules within the framework of
    expectation-maximisation-based stochastic search algorithms using
    information-geometric trust regions. We show that the use of the trust
    region results in similar updates to CMA-ES for the mean and the
    covariance matrix while it allows for the derivation of an improved
    update rule for the step-size. Our new algorithm, Trust-Region Covariance
    Matrix Adaptation Evolution Strategy (TR-CMA-ES) is
    fully derived from first order optimization principles and performs
    favourably in compare to standard CMA-ES algorithm.}
    }

  • A. Abdolmaleki, R. Lioutikov, N. Lua, P. L. Reis, J. Peters, and G. Neumann, “Model-based relative entropy stochastic search,” in Advances in Neural Information Processing Systems (NIPS), 2016, pp. 153-154.
    [BibTeX] [Abstract] [Download PDF]

    Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function. As the quality of such a quadratic approximation is limited, we do not greedily exploit the learned models. The algorithm can be misled by an inaccurate optimum introduced by the surrogate. Instead, we use information theoretic constraints to bound the ?distance? between the new and old data distribution while maximizing the objective function. Additionally the new method is able to sustain the exploration of the search distribution to avoid premature convergence. We compare our method with state of art black-box optimization methods on standard uni-modal and multi-modal optimization functions, on simulated planar robot tasks and a complex robot ball throwing task. The proposed method considerably outperforms the existing approaches.

    @inproceedings{lirolem25741,
    author = {A. Abdolmaleki and R. Lioutikov and N. Lua and L. Paulo Reis and J. Peters and G. Neumann},
    pages = {153--154},
    booktitle = {Advances in Neural Information Processing Systems (NIPS)},
    year = {2016},
    title = {Model-based relative entropy stochastic search},
    journal = {GECCO 2016 Companion - Proceedings of the 2016 Genetic and Evolutionary Computation Conference},
    url = {http://eprints.lincoln.ac.uk/25741/},
    keywords = {ARRAY(0x55d320e2ce70)},
    abstract = {Stochastic search algorithms are general black-box optimizers. Due to their ease
    of use and their generality, they have recently also gained a lot of attention in operations
    research, machine learning and policy search. Yet, these algorithms require
    a lot of evaluations of the objective, scale poorly with the problem dimension, are
    affected by highly noisy objective functions and may converge prematurely. To
    alleviate these problems, we introduce a new surrogate-based stochastic search
    approach. We learn simple, quadratic surrogate models of the objective function.
    As the quality of such a quadratic approximation is limited, we do not greedily exploit
    the learned models. The algorithm can be misled by an inaccurate optimum
    introduced by the surrogate. Instead, we use information theoretic constraints to
    bound the ?distance? between the new and old data distribution while maximizing
    the objective function. Additionally the new method is able to sustain the exploration
    of the search distribution to avoid premature convergence. We compare our
    method with state of art black-box optimization methods on standard uni-modal
    and multi-modal optimization functions, on simulated planar robot tasks and a
    complex robot ball throwing task. The proposed method considerably outperforms
    the existing approaches.}
    }

  • R. Akrour, A. Abdolmaleki, H. Abdulsamad, and G. Neumann, “Model-free trajectory optimization for reinforcement learning,” in Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 4342-4352.
    [BibTeX] [Abstract] [Download PDF]

    Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

    @inproceedings{lirolem25747,
    author = {R. Akrour and A. Abdolmaleki and H. Abdulsamad and G. Neumann},
    pages = {4342--4352},
    booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
    title = {Model-free trajectory optimization for reinforcement learning},
    year = {2016},
    month = {June},
    journal = {33rd International Conference on Machine Learning, ICML 2016},
    volume = {6},
    url = {http://eprints.lincoln.ac.uk/25747/},
    abstract = {Many of the recent Trajectory Optimization algorithms
    alternate between local approximation
    of the dynamics and conservative policy update.
    However, linearly approximating the dynamics
    in order to derive the new policy can bias the update
    and prevent convergence to the optimal policy.
    In this article, we propose a new model-free
    algorithm that backpropagates a local quadratic
    time-dependent Q-Function, allowing the derivation
    of the policy update in closed form. Our policy
    update ensures exact KL-constraint satisfaction
    without simplifying assumptions on the system
    dynamics demonstrating improved performance
    in comparison to related Trajectory Optimization
    algorithms linearizing the dynamics.},
    keywords = {ARRAY(0x55d320e2cd50)}
    }

  • C. Daniel, H. van Hoof, J. Peters, and G. Neumann, “Probabilistic inference for determining options in reinforcement learning,” Machine Learning, vol. 104, iss. 2-3, pp. 337-357, 2016.
    [BibTeX] [Abstract] [Download PDF]

    Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.

    @article{lirolem25739,
    author = {C. Daniel and H. van Hoof and J. Peters and G. Neumann},
    pages = {337--357},
    number = {2-3},
    year = {2016},
    title = {Probabilistic inference for determining options in reinforcement learning},
    volume = {104},
    journal = {Machine Learning},
    publisher = {Springer},
    month = {September},
    keywords = {ARRAY(0x55d320e2ccf0)},
    abstract = {Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.},
    url = {http://eprints.lincoln.ac.uk/25739/}
    }

  • C. Daniel, G. Neumann, O. Kroemer, and J. Peters, “Hierarchical relative entropy policy search,” Journal of Machine Learning Research, vol. 17, pp. 1-50, 2016.
    [BibTeX] [Abstract] [Download PDF]

    Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that are strongly structured. Such task structures can be exploited by incorporating hierarchical policies that consist of gating networks and sub-policies. However, this concept has only been partially explored for real world settings and complete methods, derived from first principles, are needed. Real world settings are challenging due to large and continuous state-action spaces that are prohibitive for exhaustive sampling methods. We define the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-policies for execution by the agent. In order to efficiently share experience with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables which allows for distribution of the update information between the sub-policies. We present three different variants of our algorithm, designed to be suitable for a wide variety of real world robot learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several simulations and comparisons.

    @article{lirolem25743,
    volume = {17},
    journal = {Journal of Machine Learning Research},
    month = {June},
    publisher = {Massachusetts Institute of Technology Press (MIT Press) / Microtome Publishing},
    author = {C. Daniel and G. Neumann and O. Kroemer and J. Peters},
    pages = {1--50},
    year = {2016},
    title = {Hierarchical relative entropy policy search},
    url = {http://eprints.lincoln.ac.uk/25743/},
    keywords = {ARRAY(0x55d320e2cdb0)},
    abstract = {Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that
    are strongly structured. Such task structures can be exploited by incorporating hierarchical policies
    that consist of gating networks and sub-policies. However, this concept has only been partially explored
    for real world settings and complete methods, derived from first principles, are needed. Real
    world settings are challenging due to large and continuous state-action spaces that are prohibitive
    for exhaustive sampling methods. We define the problem of learning sub-policies in continuous
    state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to
    select the low-level sub-policies for execution by the agent. In order to efficiently share experience
    with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables
    which allows for distribution of the update information between the sub-policies. We present three
    different variants of our algorithm, designed to be suitable for a wide variety of real world robot
    learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several
    simulations and comparisons.}
    }

  • A. Kupcsik, M. P. Deisenroth, J. Peters, A. P. Loh, P. Vadakkepat, and G. Neumann, “Model-based contextual policy search for data-efficient generalization of robot skills,” Artificial Intelligence, vol. 247, pp. 415-439, 2017.
    [BibTeX] [Abstract] [Download PDF]

    In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies.

    @article{lirolem25774,
    volume = {247},
    journal = {Artificial Intelligence},
    publisher = {Elsevier},
    month = {June},
    pages = {415--439},
    author = {A. Kupcsik and M. P. Deisenroth and J. Peters and A. P. Loh and P. Vadakkepat and G. Neumann},
    year = {2017},
    title = {Model-based contextual policy search for data-efficient generalization of robot skills},
    abstract = {In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies.},
    keywords = {ARRAY(0x55d320e2cac8)},
    url = {http://eprints.lincoln.ac.uk/25774/}
    }

  • A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilistic movement primitives,” in Advances in Neural Information Processing Systems, (NIPS), 2013.
    [BibTeX] [Abstract] [Download PDF]

    Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios.

    @inproceedings{lirolem25785,
    journal = {Advances in Neural Information Processing Systems},
    booktitle = {Advances in Neural Information Processing Systems, (NIPS)},
    title = {Probabilistic movement primitives},
    year = {2013},
    month = {December},
    author = {A. Paraschos and C. Daniel and J. Peters and G. Neumann},
    keywords = {ARRAY(0x55d320e2d2c0)},
    abstract = {Movement Primitives (MP) are a well-established approach for representing modular
    and re-usable robot movement generators. Many state-of-the-art robot learning
    successes are based MPs, due to their compact representation of the inherently
    continuous and high dimensional robot movements. A major goal in robot learning
    is to combine multiple MPs as building blocks in a modular control architecture
    to solve complex tasks. To this effect, a MP representation has to allow for
    blending between motions, adapting to altered task variables, and co-activating
    multiple MPs in parallel. We present a probabilistic formulation of the MP concept
    that maintains a distribution over trajectories. Our probabilistic approach
    allows for the derivation of new operations which are essential for implementing
    all aforementioned properties in one framework. In order to use such a trajectory
    distribution for robot movement control, we analytically derive a stochastic feedback
    controller which reproduces the given trajectory distribution. We evaluate
    and compare our approach to existing methods on several simulated as well as
    real robot scenarios.},
    url = {http://eprints.lincoln.ac.uk/25785/}
    }