ICRA 2017: Empowered Skills

Robot Reinforcement Learning (RL) algorithms return a policy that maximizes a global cumulative reward signal but typically do not create diverse behaviors. Hence, the policy will typically only capture a single solution of a task. However, many motor tasks have a large variety of solutions and the knowledge about these solutions can have several advantages. For example, in an adversarial setting such as robot table tennis, the lack of diversity renders the behavior predictable and hence easy to counter for the opponent. In an interactive setting such as learning from human feedback, an emphasis on diversity gives the human more opportunity for guiding the robot and to avoid the latter to be stuck in local optima of the task. In order to increase diversity of the learned behaviors, we leverage prior work on intrinsic motivation and empowerment. We derive a new intrinsic motivation signal by enriching the description of a task with an outcome space, representing interesting aspects of a sensorimotor stream. For example, in table tennis, the outcome space could be given by the return position and return ball speed. The intrinsic motivation is now given by the diversity of future outcomes, a concept also known as empowerment. We derive a new policy search algorithm that maximizes a trade-off between the extrinsic reward and this intrinsic motivation criterion. Experiments on a planar reaching task and simulated robot table tennis demonstrate that our algorithm can learn a diverse set of behaviors within the area of interest of the tasks.

  • A. Gabriel, R. Akrour, J. Peters, and G. Neumann, “Empowered skills,” in International Conference on Robotics and Automation (ICRA), 2017.
    [BibTeX] [Abstract] [Download PDF]

    Robot Reinforcement Learning (RL) algorithms return a policy that maximizes a global cumulative reward signal but typically do not create diverse behaviors. Hence, the policy will typically only capture a single solution of a task. However, many motor tasks have a large variety of solutions and the knowledge about these solutions can have several advantages. For example, in an adversarial setting such as robot table tennis, the lack of diversity renders the behavior predictable and hence easy to counter for the opponent. In an interactive setting such as learning from human feedback, an emphasis on diversity gives the human more opportunity for guiding the robot and to avoid the latter to be stuck in local optima of the task. In order to increase diversity of the learned behaviors, we leverage prior work on intrinsic motivation and empowerment. We derive a new intrinsic motivation signal by enriching the description of a task with an outcome space, representing interesting aspects of a sensorimotor stream. For example, in table tennis, the outcome space could be given by the return position and return ball speed. The intrinsic motivation is now given by the diversity of future outcomes, a concept also known as empowerment. We derive a new policy search algorithm that maximizes a trade-off between the extrinsic reward and this intrinsic motivation criterion. Experiments on a planar reaching task and simulated robot table tennis demonstrate that our algorithm can learn a diverse set of behaviors within the area of interest of the tasks.

    @inproceedings{lirolem26736,
    title = {Empowered skills},
    month = {May},
    author = {A. Gabriel and R. Akrour and J. Peters and G. Neumann},
    year = {2017},
    booktitle = {International Conference on Robotics and Automation (ICRA)},
    abstract = {Robot Reinforcement Learning (RL) algorithms
    return a policy that maximizes a global cumulative reward
    signal but typically do not create diverse behaviors. Hence, the
    policy will typically only capture a single solution of a task.
    However, many motor tasks have a large variety of solutions
    and the knowledge about these solutions can have several
    advantages. For example, in an adversarial setting such as
    robot table tennis, the lack of diversity renders the behavior
    predictable and hence easy to counter for the opponent. In an
    interactive setting such as learning from human feedback, an
    emphasis on diversity gives the human more opportunity for
    guiding the robot and to avoid the latter to be stuck in local
    optima of the task. In order to increase diversity of the learned
    behaviors, we leverage prior work on intrinsic motivation and
    empowerment. We derive a new intrinsic motivation signal by
    enriching the description of a task with an outcome space,
    representing interesting aspects of a sensorimotor stream. For
    example, in table tennis, the outcome space could be given
    by the return position and return ball speed. The intrinsic
    motivation is now given by the diversity of future outcomes,
    a concept also known as empowerment. We derive a new
    policy search algorithm that maximizes a trade-off between
    the extrinsic reward and this intrinsic motivation criterion.
    Experiments on a planar reaching task and simulated robot
    table tennis demonstrate that our algorithm can learn a diverse
    set of behaviors within the area of interest of the tasks.},
    url = {http://eprints.lincoln.ac.uk/26736/},
    keywords = {ARRAY(0x56147f374a98)}
    }