Hierarchical Reinforcement Learning

  • ISER 2016: Experiments with hierarchical reinforcement learning of multiple grasping policies

    Robotic grasping has attracted considerable interest, but it still remains a challenging task. The data-driven approach is a promising solution to the robotic grasping problem; this approach leverages a grasp dataset and generalizes grasps for various objects. However, these methods often depend on the quality of the given datasets, which are not trivial to obtain with sufficient quality. Although reinforcement learning approaches have been recently used to achieve autonomous collection of grasp datasets, the existing algorithms are often limited to specific grasp types. In this paper, we present a framework for hierarchical reinforcement learning of grasping policies. In our framework, the lowerlevel hierarchy learns multiple grasp types, and the upper-level hierarchy learns a policy to select from the learned grasp types according to a point cloud of a new object. Through experiments, we validate that our approach learns grasping by constructing the grasp dataset autonomously. The experimental results show that our approach learns multiple grasping policies and generalizes the learned grasps by using local point cloud information.

    • T. Osa, J. Peters, and G. Neumann, “Experiments with hierarchical reinforcement learning of multiple grasping policies,” in Proceedings of the International Symposium on Experimental Robotics (ISER), 2016.
      [BibTeX] [Abstract] [Download PDF]

      Robotic grasping has attracted considerable interest, but it still remains a challenging task. The data-driven approach is a promising solution to the robotic grasping problem; this approach leverages a grasp dataset and generalizes grasps for various objects. However, these methods often depend on the quality of the given datasets, which are not trivial to obtain with sufficient quality. Although reinforcement learning approaches have been recently used to achieve autonomous collection of grasp datasets, the existing algorithms are often limited to specific grasp types. In this paper, we present a framework for hierarchical reinforcement learning of grasping policies. In our framework, the lowerlevel hierarchy learns multiple grasp types, and the upper-level hierarchy learns a policy to select from the learned grasp types according to a point cloud of a new object. Through experiments, we validate that our approach learns grasping by constructing the grasp dataset autonomously. The experimental results show that our approach learns multiple grasping policies and generalizes the learned grasps by using local point cloud information.

      @inproceedings{lirolem26735,
      year = {2016},
      title = {Experiments with hierarchical reinforcement learning of multiple grasping policies},
      month = {April},
      booktitle = {Proceedings of the International Symposium on Experimental Robotics (ISER)},
      author = {T. Osa and J. Peters and G. Neumann},
      abstract = {Robotic grasping has attracted considerable interest, but it
      still remains a challenging task. The data-driven approach is a promising
      solution to the robotic grasping problem; this approach leverages a
      grasp dataset and generalizes grasps for various objects. However, these
      methods often depend on the quality of the given datasets, which are not
      trivial to obtain with sufficient quality. Although reinforcement learning
      approaches have been recently used to achieve autonomous collection
      of grasp datasets, the existing algorithms are often limited to specific
      grasp types. In this paper, we present a framework for hierarchical reinforcement
      learning of grasping policies. In our framework, the lowerlevel
      hierarchy learns multiple grasp types, and the upper-level hierarchy
      learns a policy to select from the learned grasp types according to a point
      cloud of a new object. Through experiments, we validate that our approach
      learns grasping by constructing the grasp dataset autonomously.
      The experimental results show that our approach learns multiple grasping
      policies and generalizes the learned grasps by using local point cloud
      information.},
      keywords = {ARRAY(0x558aaed947a8)},
      url = {http://eprints.lincoln.ac.uk/26735/}
      }

    Video:

  • ICRA 2017: Layered Direct Policy Search for Learning Hierarchical Skills

    Solutions to real world robotic tasks often require complex behaviors in high dimensional continuous state and action spaces. Reinforcement Learning (RL) is aimed at learning such behaviors but often fails for lack of scalability. To address this issue, Hierarchical RL (HRL) algorithms leverage hierarchical policies to exploit the structure of a task. However, many HRL algorithms rely on task specific knowledge such as a set of predefined sub-policies or sub-goals. In this paper we propose a new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies. Moreover, the learning process mirrors the policys structure and is thus also hierarchical, consisting of a set of independent optimization problems. The hierarchical structure of the learning process allows us to control the learning rate of the sub-policies and the gating individually and add specific information theoretic constraints to each layer to ensure the diversification of the subpolicies. We evaluate our algorithm on two high dimensional continuous tasks and experimentally demonstrate its ability to autonomously discover a rich set of sub-policies.

    • F. End, R. Akrour, J. Peters, and G. Neumann, “Layered direct policy search for learning hierarchical skills,” in International Conference on Robotics and Automation (ICRA), 2017.
      [BibTeX] [Abstract] [Download PDF]

      Solutions to real world robotic tasks often require complex behaviors in high dimensional continuous state and action spaces. Reinforcement Learning (RL) is aimed at learning such behaviors but often fails for lack of scalability. To address this issue, Hierarchical RL (HRL) algorithms leverage hierarchical policies to exploit the structure of a task. However, many HRL algorithms rely on task specific knowledge such as a set of predefined sub-policies or sub-goals. In this paper we propose a new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies. Moreover, the learning process mirrors the policys structure and is thus also hierarchical, consisting of a set of independent optimization problems. The hierarchical structure of the learning process allows us to control the learning rate of the sub-policies and the gating individually and add specific information theoretic constraints to each layer to ensure the diversification of the subpolicies. We evaluate our algorithm on two high dimensional continuous tasks and experimentally demonstrate its ability to autonomously discover a rich set of sub-policies.

      @inproceedings{lirolem26737,
      year = {2017},
      title = {Layered direct policy search for learning hierarchical skills},
      month = {May},
      booktitle = {International Conference on Robotics and Automation (ICRA)},
      author = {F. End and R. Akrour and J. Peters and G. Neumann},
      url = {http://eprints.lincoln.ac.uk/26737/},
      keywords = {ARRAY(0x558ab0a18e58)},
      abstract = {Solutions to real world robotic tasks often require
      complex behaviors in high dimensional continuous state and
      action spaces. Reinforcement Learning (RL) is aimed at learning
      such behaviors but often fails for lack of scalability. To
      address this issue, Hierarchical RL (HRL) algorithms leverage
      hierarchical policies to exploit the structure of a task. However,
      many HRL algorithms rely on task specific knowledge such
      as a set of predefined sub-policies or sub-goals. In this paper
      we propose a new HRL algorithm based on information
      theoretic principles to autonomously uncover a diverse set
      of sub-policies and their activation policies. Moreover, the
      learning process mirrors the policys structure and is thus also
      hierarchical, consisting of a set of independent optimization
      problems. The hierarchical structure of the learning process
      allows us to control the learning rate of the sub-policies and
      the gating individually and add specific information theoretic
      constraints to each layer to ensure the diversification of the subpolicies.
      We evaluate our algorithm on two high dimensional
      continuous tasks and experimentally demonstrate its ability to
      autonomously discover a rich set of sub-policies.}
      }

  • Machine Learning Journal 2016: Probabilistic inference for determining options in reinforcement learning

    Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.

    • C. Daniel, H. van Hoof, J. Peters, and G. Neumann, “Probabilistic inference for determining options in reinforcement learning,” Machine Learning, vol. 104, iss. 2-3, pp. 337-357, 2016.
      [BibTeX] [Abstract] [Download PDF]

      Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.

      @article{lirolem25739,
      title = {Probabilistic inference for determining options in reinforcement learning},
      month = {September},
      author = {C. Daniel and H. van Hoof and J. Peters and G. Neumann},
      pages = {337--357},
      publisher = {Springer},
      number = {2-3},
      volume = {104},
      year = {2016},
      journal = {Machine Learning},
      keywords = {ARRAY(0x558aaf5bf808)},
      url = {http://eprints.lincoln.ac.uk/25739/},
      abstract = {Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.}
      }

  • JMLR 2016: Hierarchical Relative Entropy Policy Search

    Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that are strongly structured. Such task structures can be exploited by incorporating hierarchical policies that consist of gating networks and sub-policies. However, this concept has only been partially explored
    for real world settings and complete methods, derived from first principles, are needed. Real world settings are challenging due to large and continuous state-action spaces that are prohibitive for exhaustive sampling methods. We define the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-policies for execution by the agent. In order to efficiently share experience with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables which allows for distribution of the update information between the sub-policies. We present three different variants of our algorithm, designed to be suitable for a wide variety of real world robot learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several simulations and comparisons.

    • C. Daniel, G. Neumann, O. Kroemer, and J. Peters, “Hierarchical relative entropy policy search,” Journal of Machine Learning Research, vol. 17, pp. 1-50, 2016.
      [BibTeX] [Abstract] [Download PDF]

      Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that are strongly structured. Such task structures can be exploited by incorporating hierarchical policies that consist of gating networks and sub-policies. However, this concept has only been partially explored for real world settings and complete methods, derived from first principles, are needed. Real world settings are challenging due to large and continuous state-action spaces that are prohibitive for exhaustive sampling methods. We define the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-policies for execution by the agent. In order to efficiently share experience with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables which allows for distribution of the update information between the sub-policies. We present three different variants of our algorithm, designed to be suitable for a wide variety of real world robot learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several simulations and comparisons.

      @article{lirolem25743,
      year = {2016},
      volume = {17},
      journal = {Journal of Machine Learning Research},
      title = {Hierarchical relative entropy policy search},
      month = {June},
      publisher = {Massachusetts Institute of Technology Press (MIT Press) / Microtome Publishing},
      pages = {1--50},
      author = {C. Daniel and G. Neumann and O. Kroemer and J. Peters},
      keywords = {ARRAY(0x558ab094c2b8)},
      url = {http://eprints.lincoln.ac.uk/25743/},
      abstract = {Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that
      are strongly structured. Such task structures can be exploited by incorporating hierarchical policies
      that consist of gating networks and sub-policies. However, this concept has only been partially explored
      for real world settings and complete methods, derived from first principles, are needed. Real
      world settings are challenging due to large and continuous state-action spaces that are prohibitive
      for exhaustive sampling methods. We define the problem of learning sub-policies in continuous
      state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to
      select the low-level sub-policies for execution by the agent. In order to efficiently share experience
      with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables
      which allows for distribution of the update information between the sub-policies. We present three
      different variants of our algorithm, designed to be suitable for a wide variety of real world robot
      learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several
      simulations and comparisons.}
      }