Black-box Optimization

  • IJCAI 2017: Contextual CMA-ES

    Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i.e., if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMA-ES algorithm to the contextual setting and illustrate its performance on several contextual tasks. Our new algorithm, called contextual CMAES, leverages from contextual learning while it preserves all the features of standard CMA-ES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.

    • A. Abdolmaleki, B. Price, N. Lau, P. Reis, and G. Neumann, “Contextual CMA-ES,” in International Joint Conference on Artificial Intelligence (IJCAI), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i.e., if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMA-ES algorithm to the contextual setting and illustrate its performance on several contextual tasks. Our new algorithm, called contextual CMAES, leverages from contextual learning while it preserves all the features of standard CMA-ES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.

      @inproceedings{lirolem28141,
      month = {August},
      year = {2017},
      title = {Contextual CMA-ES},
      author = {A. Abdolmaleki and B. Price and N. Lau and P. Reis and G. Neumann},
      booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)},
      url = {http://eprints.lincoln.ac.uk/28141/},
      abstract = {Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i.e., if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms
      that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMA-ES algorithm to the contextual setting and illustrate its performance on several contextual
      tasks. Our new algorithm, called contextual CMAES, leverages from contextual learning while it preserves all the features of standard CMA-ES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.},
      keywords = {ARRAY(0x55fe0a229840)}
      }

     

     

  • ICML 2017: Local Bayesian Optimization

    Bayesian optimization is renowned for its sample efficiency but its application to higher dimensional tasks is impeded by its focus on global optimization. To scale to higher dimensional problems, we leverage the sample efficiency of Bayesian optimization in a local context. The optimization of the acquisition function is restricted to the vicinity of a Gaussian search distribution which is moved towards high value areas of the objective. The proposed informationtheoretic update of the search distribution results in a Bayesian interpretation of local stochastic search: the search distribution encodes prior knowledge on the optimum’s location and is weighted at each iteration by the likelihood of this location’s optimality. We demonstrate the effectiveness of our algorithm on several benchmark objective functions as well as a continuous robotic task in which an informative prior is obtained by imitation learning.

    • R. Akrour, D. Sorokin, J. Peters, and G. Neumann, “Local Bayesian optimization of motor skills,” in International Conference on Machine Learning (ICML), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Bayesian optimization is renowned for its sample efficiency but its application to higher dimensional tasks is impeded by its focus on global optimization. To scale to higher dimensional problems, we leverage the sample efficiency of Bayesian optimization in a local context. The optimization of the acquisition function is restricted to the vicinity of a Gaussian search distribution which is moved towards high value areas of the objective. The proposed informationtheoretic update of the search distribution results in a Bayesian interpretation of local stochastic search: the search distribution encodes prior knowledge on the optimum?s location and is weighted at each iteration by the likelihood of this location?s optimality. We demonstrate the effectiveness of our algorithm on several benchmark objective functions as well as a continuous robotic task in which an informative prior is obtained by imitation learning.

      @inproceedings{lirolem27902,
      year = {2017},
      month = {August},
      booktitle = {International Conference on Machine Learning (ICML)},
      title = {Local Bayesian optimization of motor skills},
      author = {R. Akrour and D. Sorokin and J. Peters and G. Neumann},
      url = {http://eprints.lincoln.ac.uk/27902/},
      keywords = {ARRAY(0x55fe0aa066e0)},
      abstract = {Bayesian optimization is renowned for its sample
      efficiency but its application to higher dimensional
      tasks is impeded by its focus on global
      optimization. To scale to higher dimensional
      problems, we leverage the sample efficiency of
      Bayesian optimization in a local context. The
      optimization of the acquisition function is restricted
      to the vicinity of a Gaussian search distribution
      which is moved towards high value areas
      of the objective. The proposed informationtheoretic
      update of the search distribution results
      in a Bayesian interpretation of local stochastic
      search: the search distribution encodes prior
      knowledge on the optimum?s location and is
      weighted at each iteration by the likelihood of
      this location?s optimality. We demonstrate the
      effectiveness of our algorithm on several benchmark
      objective functions as well as a continuous
      robotic task in which an informative prior is obtained
      by imitation learning.}
      }

  • GECCO 2017: Deriving and Improving CMA-ES with Information-Geometric Trustregions

    CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without the need of extensive parameter tuning. The algorithm has many beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates the current samples as well as the evolution path and its invariance properties. Its update rules are composed of well established heuristics where the theoretical foundations of some of these rules are also well understood. In this paper we will fully derive all CMA-ES update rules within the framework of expectation-maximisation-based stochastic search algorithms using information-geometric trust regions. We show that the use of the trust region results in similar updates to CMA-ES for the mean and the covariance matrix while it allows for the derivation of an improved update rule for the step-size. Our new algorithm, Trust-Region Covariance Matrix Adaptation Evolution Strategy (TR-CMA-ES) is fully derived from first order optimization principles and performs favourably in compare to standard CMA-ES algorithm.

    • A. Abdolmaleki, B. Price, N. Lau, L. P. Reis, and G. Neumann, “Deriving and improving CMA-ES with Information geometric trust regions,” in The Genetic and Evolutionary Computation Conference (GECCO 2017), 2017.
      [BibTeX] [Abstract] [Download PDF]

      CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without the need of extensive parameter tuning. The algorithm has many beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates the current samples as well as the evolution path and its invariance properties. Its update rules are composed of well established heuristics where the theoretical foundations of some of these rules are also well understood. In this paper we will fully derive all CMA-ES update rules within the framework of expectation-maximisation-based stochastic search algorithms using information-geometric trust regions. We show that the use of the trust region results in similar updates to CMA-ES for the mean and the covariance matrix while it allows for the derivation of an improved update rule for the step-size. Our new algorithm, Trust-Region Covariance Matrix Adaptation Evolution Strategy (TR-CMA-ES) is fully derived from first order optimization principles and performs favourably in compare to standard CMA-ES algorithm.

      @inproceedings{lirolem27056,
      author = {Abbas Abdolmaleki and Bob Price and Nuno Lau and Luis Paulo Reis and Gerhard Neumann},
      title = {Deriving and improving CMA-ES with Information geometric trust regions},
      booktitle = {The Genetic and Evolutionary Computation Conference (GECCO 2017)},
      month = {July},
      year = {2017},
      url = {http://eprints.lincoln.ac.uk/27056/},
      keywords = {ARRAY(0x55fe0a6d4f30)},
      abstract = {CMA-ES is one of the most popular stochastic search algorithms.
      It performs favourably in many tasks without the need of extensive
      parameter tuning. The algorithm has many beneficial properties,
      including automatic step-size adaptation, efficient covariance updates
      that incorporates the current samples as well as the evolution
      path and its invariance properties. Its update rules are composed
      of well established heuristics where the theoretical foundations of
      some of these rules are also well understood. In this paper we
      will fully derive all CMA-ES update rules within the framework of
      expectation-maximisation-based stochastic search algorithms using
      information-geometric trust regions. We show that the use of the trust
      region results in similar updates to CMA-ES for the mean and the
      covariance matrix while it allows for the derivation of an improved
      update rule for the step-size. Our new algorithm, Trust-Region Covariance
      Matrix Adaptation Evolution Strategy (TR-CMA-ES) is
      fully derived from first order optimization principles and performs
      favourably in compare to standard CMA-ES algorithm.}
      }

  • ICRA 2017: Empowered Skills

    Robot Reinforcement Learning (RL) algorithms return a policy that maximizes a global cumulative reward signal but typically do not create diverse behaviors. Hence, the policy will typically only capture a single solution of a task. However, many motor tasks have a large variety of solutions and the knowledge about these solutions can have several advantages. For example, in an adversarial setting such as robot table tennis, the lack of diversity renders the behavior predictable and hence easy to counter for the opponent. In an interactive setting such as learning from human feedback, an emphasis on diversity gives the human more opportunity for guiding the robot and to avoid the latter to be stuck in local optima of the task. In order to increase diversity of the learned behaviors, we leverage prior work on intrinsic motivation and empowerment. We derive a new intrinsic motivation signal by enriching the description of a task with an outcome space, representing interesting aspects of a sensorimotor stream. For example, in table tennis, the outcome space could be given by the return position and return ball speed. The intrinsic motivation is now given by the diversity of future outcomes, a concept also known as empowerment. We derive a new policy search algorithm that maximizes a trade-off between the extrinsic reward and this intrinsic motivation criterion. Experiments on a planar reaching task and simulated robot table tennis demonstrate that our algorithm can learn a diverse set of behaviors within the area of interest of the tasks.

    • A. Gabriel, R. Akrour, J. Peters, and G. Neumann, “Empowered skills,” in International Conference on Robotics and Automation (ICRA), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Robot Reinforcement Learning (RL) algorithms return a policy that maximizes a global cumulative reward signal but typically do not create diverse behaviors. Hence, the policy will typically only capture a single solution of a task. However, many motor tasks have a large variety of solutions and the knowledge about these solutions can have several advantages. For example, in an adversarial setting such as robot table tennis, the lack of diversity renders the behavior predictable and hence easy to counter for the opponent. In an interactive setting such as learning from human feedback, an emphasis on diversity gives the human more opportunity for guiding the robot and to avoid the latter to be stuck in local optima of the task. In order to increase diversity of the learned behaviors, we leverage prior work on intrinsic motivation and empowerment. We derive a new intrinsic motivation signal by enriching the description of a task with an outcome space, representing interesting aspects of a sensorimotor stream. For example, in table tennis, the outcome space could be given by the return position and return ball speed. The intrinsic motivation is now given by the diversity of future outcomes, a concept also known as empowerment. We derive a new policy search algorithm that maximizes a trade-off between the extrinsic reward and this intrinsic motivation criterion. Experiments on a planar reaching task and simulated robot table tennis demonstrate that our algorithm can learn a diverse set of behaviors within the area of interest of the tasks.

      @inproceedings{lirolem26736,
      month = {May},
      year = {2017},
      title = {Empowered skills},
      author = {A. Gabriel and R. Akrour and J. Peters and G. Neumann},
      booktitle = {International Conference on Robotics and Automation (ICRA)},
      url = {http://eprints.lincoln.ac.uk/26736/},
      keywords = {ARRAY(0x55fe0a7e8e20)},
      abstract = {Robot Reinforcement Learning (RL) algorithms
      return a policy that maximizes a global cumulative reward
      signal but typically do not create diverse behaviors. Hence, the
      policy will typically only capture a single solution of a task.
      However, many motor tasks have a large variety of solutions
      and the knowledge about these solutions can have several
      advantages. For example, in an adversarial setting such as
      robot table tennis, the lack of diversity renders the behavior
      predictable and hence easy to counter for the opponent. In an
      interactive setting such as learning from human feedback, an
      emphasis on diversity gives the human more opportunity for
      guiding the robot and to avoid the latter to be stuck in local
      optima of the task. In order to increase diversity of the learned
      behaviors, we leverage prior work on intrinsic motivation and
      empowerment. We derive a new intrinsic motivation signal by
      enriching the description of a task with an outcome space,
      representing interesting aspects of a sensorimotor stream. For
      example, in table tennis, the outcome space could be given
      by the return position and return ball speed. The intrinsic
      motivation is now given by the diversity of future outcomes,
      a concept also known as empowerment. We derive a new
      policy search algorithm that maximizes a trade-off between
      the extrinsic reward and this intrinsic motivation criterion.
      Experiments on a planar reaching task and simulated robot
      table tennis demonstrate that our algorithm can learn a diverse
      set of behaviors within the area of interest of the tasks.}
      }

  • ICRA 2017: Layered Direct Policy Search for Learning Hierarchical Skills

    Solutions to real world robotic tasks often require complex behaviors in high dimensional continuous state and action spaces. Reinforcement Learning (RL) is aimed at learning such behaviors but often fails for lack of scalability. To address this issue, Hierarchical RL (HRL) algorithms leverage hierarchical policies to exploit the structure of a task. However, many HRL algorithms rely on task specific knowledge such as a set of predefined sub-policies or sub-goals. In this paper we propose a new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies. Moreover, the learning process mirrors the policys structure and is thus also hierarchical, consisting of a set of independent optimization problems. The hierarchical structure of the learning process allows us to control the learning rate of the sub-policies and the gating individually and add specific information theoretic constraints to each layer to ensure the diversification of the subpolicies. We evaluate our algorithm on two high dimensional continuous tasks and experimentally demonstrate its ability to autonomously discover a rich set of sub-policies.

    • F. End, R. Akrour, J. Peters, and G. Neumann, “Layered direct policy search for learning hierarchical skills,” in International Conference on Robotics and Automation (ICRA), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Solutions to real world robotic tasks often require complex behaviors in high dimensional continuous state and action spaces. Reinforcement Learning (RL) is aimed at learning such behaviors but often fails for lack of scalability. To address this issue, Hierarchical RL (HRL) algorithms leverage hierarchical policies to exploit the structure of a task. However, many HRL algorithms rely on task specific knowledge such as a set of predefined sub-policies or sub-goals. In this paper we propose a new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies. Moreover, the learning process mirrors the policys structure and is thus also hierarchical, consisting of a set of independent optimization problems. The hierarchical structure of the learning process allows us to control the learning rate of the sub-policies and the gating individually and add specific information theoretic constraints to each layer to ensure the diversification of the subpolicies. We evaluate our algorithm on two high dimensional continuous tasks and experimentally demonstrate its ability to autonomously discover a rich set of sub-policies.

      @inproceedings{lirolem26737,
      month = {May},
      year = {2017},
      title = {Layered direct policy search for learning hierarchical skills},
      author = {F. End and R. Akrour and J. Peters and G. Neumann},
      booktitle = {International Conference on Robotics and Automation (ICRA)},
      url = {http://eprints.lincoln.ac.uk/26737/},
      keywords = {ARRAY(0x55fe0a823d48)},
      abstract = {Solutions to real world robotic tasks often require
      complex behaviors in high dimensional continuous state and
      action spaces. Reinforcement Learning (RL) is aimed at learning
      such behaviors but often fails for lack of scalability. To
      address this issue, Hierarchical RL (HRL) algorithms leverage
      hierarchical policies to exploit the structure of a task. However,
      many HRL algorithms rely on task specific knowledge such
      as a set of predefined sub-policies or sub-goals. In this paper
      we propose a new HRL algorithm based on information
      theoretic principles to autonomously uncover a diverse set
      of sub-policies and their activation policies. Moreover, the
      learning process mirrors the policys structure and is thus also
      hierarchical, consisting of a set of independent optimization
      problems. The hierarchical structure of the learning process
      allows us to control the learning rate of the sub-policies and
      the gating individually and add specific information theoretic
      constraints to each layer to ensure the diversification of the subpolicies.
      We evaluate our algorithm on two high dimensional
      continuous tasks and experimentally demonstrate its ability to
      autonomously discover a rich set of sub-policies.}
      }

  • NIPS 2015: Model-Based Relative Entropy Stochastic Search

    Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function. As the quality of such a quadratic approximation is limited, we do not greedily exploit the learned models. The algorithm can be misled by an inaccurate optimum introduced by the surrogate. Instead, we use information theoretic constraints to bound the `distance’ between the new and old data distribution while maximizing the objective function. Additionally the new method is able to sustain the exploration of the search distribution to avoid premature convergence. We compare our method with state of art black-box optimization methods on standard uni-modal and multi-modal optimization functions, on simulated planar robot tasks and a complex robot ball throwing task.The proposed method considerably outperforms the existing approaches.

    • A. Abdolmaleki, R. Lioutikov, N. Lua, P. L. Reis, J. Peters, and G. Neumann, “Model-based relative entropy stochastic search,” in Advances in Neural Information Processing Systems (NIPS), 2016, pp. 153-154.
      [BibTeX] [Abstract] [Download PDF]

      Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function. As the quality of such a quadratic approximation is limited, we do not greedily exploit the learned models. The algorithm can be misled by an inaccurate optimum introduced by the surrogate. Instead, we use information theoretic constraints to bound the ?distance? between the new and old data distribution while maximizing the objective function. Additionally the new method is able to sustain the exploration of the search distribution to avoid premature convergence. We compare our method with state of art black-box optimization methods on standard uni-modal and multi-modal optimization functions, on simulated planar robot tasks and a complex robot ball throwing task. The proposed method considerably outperforms the existing approaches.

      @inproceedings{lirolem25741,
      pages = {153--154},
      year = {2016},
      author = {A. Abdolmaleki and R. Lioutikov and N. Lua and L. Paulo Reis and J. Peters and G. Neumann},
      title = {Model-based relative entropy stochastic search},
      booktitle = {Advances in Neural Information Processing Systems (NIPS)},
      journal = {GECCO 2016 Companion - Proceedings of the 2016 Genetic and Evolutionary Computation Conference},
      keywords = {ARRAY(0x55fe0a92bb98)},
      abstract = {Stochastic search algorithms are general black-box optimizers. Due to their ease
      of use and their generality, they have recently also gained a lot of attention in operations
      research, machine learning and policy search. Yet, these algorithms require
      a lot of evaluations of the objective, scale poorly with the problem dimension, are
      affected by highly noisy objective functions and may converge prematurely. To
      alleviate these problems, we introduce a new surrogate-based stochastic search
      approach. We learn simple, quadratic surrogate models of the objective function.
      As the quality of such a quadratic approximation is limited, we do not greedily exploit
      the learned models. The algorithm can be misled by an inaccurate optimum
      introduced by the surrogate. Instead, we use information theoretic constraints to
      bound the ?distance? between the new and old data distribution while maximizing
      the objective function. Additionally the new method is able to sustain the exploration
      of the search distribution to avoid premature convergence. We compare our
      method with state of art black-box optimization methods on standard uni-modal
      and multi-modal optimization functions, on simulated planar robot tasks and a
      complex robot ball throwing task. The proposed method considerably outperforms
      the existing approaches.},
      url = {http://eprints.lincoln.ac.uk/25741/}
      }