ICRA 2017: Layered Direct Policy Search for Learning Hierarchical Skills

Solutions to real world robotic tasks often require complex behaviors in high dimensional continuous state and action spaces. Reinforcement Learning (RL) is aimed at learning such behaviors but often fails for lack of scalability. To address this issue, Hierarchical RL (HRL) algorithms leverage hierarchical policies to exploit the structure of a task. However, many HRL algorithms rely on task specific knowledge such as a set of predefined sub-policies or sub-goals. In this paper we propose a new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies. Moreover, the learning process mirrors the policys structure and is thus also hierarchical, consisting of a set of independent optimization problems. The hierarchical structure of the learning process allows us to control the learning rate of the sub-policies and the gating individually and add specific information theoretic constraints to each layer to ensure the diversification of the subpolicies. We evaluate our algorithm on two high dimensional continuous tasks and experimentally demonstrate its ability to autonomously discover a rich set of sub-policies.

  • F. End, R. Akrour, J. Peters, and G. Neumann, “Layered direct policy search for learning hierarchical skills,” in International Conference on Robotics and Automation (ICRA), 2017.
    [BibTeX] [Abstract] [Download PDF]

    Solutions to real world robotic tasks often require complex behaviors in high dimensional continuous state and action spaces. Reinforcement Learning (RL) is aimed at learning such behaviors but often fails for lack of scalability. To address this issue, Hierarchical RL (HRL) algorithms leverage hierarchical policies to exploit the structure of a task. However, many HRL algorithms rely on task specific knowledge such as a set of predefined sub-policies or sub-goals. In this paper we propose a new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies. Moreover, the learning process mirrors the policys structure and is thus also hierarchical, consisting of a set of independent optimization problems. The hierarchical structure of the learning process allows us to control the learning rate of the sub-policies and the gating individually and add specific information theoretic constraints to each layer to ensure the diversification of the subpolicies. We evaluate our algorithm on two high dimensional continuous tasks and experimentally demonstrate its ability to autonomously discover a rich set of sub-policies.

    @inproceedings{lirolem26737,
    year = {2017},
    title = {Layered direct policy search for learning hierarchical skills},
    month = {May},
    author = {F. End and R. Akrour and J. Peters and G. Neumann},
    booktitle = {International Conference on Robotics and Automation (ICRA)},
    url = {http://eprints.lincoln.ac.uk/26737/},
    keywords = {ARRAY(0x558aaecdd388)},
    abstract = {Solutions to real world robotic tasks often require
    complex behaviors in high dimensional continuous state and
    action spaces. Reinforcement Learning (RL) is aimed at learning
    such behaviors but often fails for lack of scalability. To
    address this issue, Hierarchical RL (HRL) algorithms leverage
    hierarchical policies to exploit the structure of a task. However,
    many HRL algorithms rely on task specific knowledge such
    as a set of predefined sub-policies or sub-goals. In this paper
    we propose a new HRL algorithm based on information
    theoretic principles to autonomously uncover a diverse set
    of sub-policies and their activation policies. Moreover, the
    learning process mirrors the policys structure and is thus also
    hierarchical, consisting of a set of independent optimization
    problems. The hierarchical structure of the learning process
    allows us to control the learning rate of the sub-policies and
    the gating individually and add specific information theoretic
    constraints to each layer to ensure the diversification of the subpolicies.
    We evaluate our algorithm on two high dimensional
    continuous tasks and experimentally demonstrate its ability to
    autonomously discover a rich set of sub-policies.}
    }