Learning Representations

Autonomous Robots need to perceive the world with high dimensional sensors and create an internal representations of the environment given their sensory data. These representations can subsequently be used for decision making, state estimation and long-term prediction.

Papers:

  • JMLR 2017: Non-parametric Policy Search with Limited Information Loss.

    Learning complex control policies from non-linear and redundant sensory input is an important challenge for reinforcement learning algorithms. Non-parametric methods that approximate values functions or transition models can address this problem, by adapting to the complexity of the dataset. Yet, many current non-parametric approaches rely on unstable greedy maximization of approximate value functions, which might lead to poor convergence or oscillations in the policy update. A more robust policy update can be obtained by limiting the information loss between successive state-action distributions. In this paper, we develop a policy search algorithm with policy updates that are both robust and non-parametric. Our method can learn non-parametric control policies for infinite horizon continuous Markov decision processes with non-linear and redundant sensory representations.
    We investigate how we can use approximations of the kernel function to reduce the time requirements of the demanding non-parametric computations. In our experiments, we show the strong performance of the proposed method, and how it can be approximated efficiently. Finally, we show that our algorithm can learn a real-robot underpowered swing-up task directly from image data.

    • H. van Hoof, G. Neumann, and J. Peters, “Non-parametric policy search with limited information loss,” Journal of Machine Learning Research, vol. 18, iss. 73, pp. 1-46, 2018.
      [BibTeX] [Abstract] [Download PDF]

      Learning complex control policies from non-linear and redundant sensory input is an important challenge for reinforcement learning algorithms. Non-parametric methods that approximate values functions or transition models can address this problem, by adapting to the complexity of the dataset. Yet, many current non-parametric approaches rely on unstable greedy maximization of approximate value functions, which might lead to poor convergence or oscillations in the policy update. A more robust policy update can be obtained by limiting the information loss between successive state-action distributions. In this paper, we develop a policy search algorithm with policy updates that are both robust and non-parametric. Our method can learn non-parametric control policies for infinite horizon continuous Markov decision processes with non-linear and redundant sensory representations. We investigate how we can use approximations of the kernel function to reduce the time requirements of the demanding non-parametric computations. In our experiments, we show the strong performance of the proposed method, and how it can be approximated effi- ciently. Finally, we show that our algorithm can learn a real-robot underpowered swing-up task directly from image data.

      @article{lirolem28020,
      publisher = {Journal of Machine Learning Research},
      year = {2018},
      number = {73},
      journal = {Journal of Machine Learning Research},
      month = {December},
      pages = {1--46},
      author = {Herke van Hoof and Gerhard Neumann and Jan Peters},
      volume = {18},
      title = {Non-parametric policy search with limited information loss},
      abstract = {Learning complex control policies from non-linear and redundant sensory input is an important
      challenge for reinforcement learning algorithms. Non-parametric methods that
      approximate values functions or transition models can address this problem, by adapting
      to the complexity of the dataset. Yet, many current non-parametric approaches rely on
      unstable greedy maximization of approximate value functions, which might lead to poor
      convergence or oscillations in the policy update. A more robust policy update can be obtained
      by limiting the information loss between successive state-action distributions. In this
      paper, we develop a policy search algorithm with policy updates that are both robust and
      non-parametric. Our method can learn non-parametric control policies for infinite horizon
      continuous Markov decision processes with non-linear and redundant sensory representations.
      We investigate how we can use approximations of the kernel function to reduce the
      time requirements of the demanding non-parametric computations. In our experiments, we
      show the strong performance of the proposed method, and how it can be approximated effi-
      ciently. Finally, we show that our algorithm can learn a real-robot underpowered swing-up
      task directly from image data.},
      keywords = {ARRAY(0x55fe0a5fc6a8)},
      url = {http://eprints.lincoln.ac.uk/28020/}
      }

  • AAAI 2017: The kernel Kalman rule: efficient nonparametric inference with recursive least squares

    Nonparametric inference techniques provide promising tools for probabilistic reasoning in high-dimensional nonlinear systems. Most of these techniques embed distributions into reproducing kernel Hilbert spaces (RKHS) and rely on the kernel Bayes’ rule (KBR) to manipulate the embeddings. However, the computational demands of the KBR scale poorly with the number of samples and the KBR often suffers from numerical instabilities. In this paper, we present the kernel Kalman rule (KKR) as an alternative to the KBR. The derivation of the KKR is based on recursive least squares, inspired by the derivation of the Kalman innovation update. We apply the KKR to filtering tasks where we use RKHS embeddings to represent the belief state, resulting in the kernel Kalman filter (KKF). We show on a nonlinear state estimation task with high dimensional observations that our approach provides a significantly improved estimation accuracy while the computational demands are significantly decreased.

    • G. H. W. Gebhardt, A. Kupcsik, and G. Neumann, “The kernel Kalman rule: efficient nonparametric inference with recursive least squares,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
      [BibTeX] [Abstract] [Download PDF]

      Nonparametric inference techniques provide promising tools for probabilistic reasoning in high-dimensional nonlinear systems. Most of these techniques embed distributions into reproducing kernel Hilbert spaces (RKHS) and rely on the kernel Bayes? rule (KBR) to manipulate the embeddings. However, the computational demands of the KBR scale poorly with the number of samples and the KBR often suffers from numerical instabilities. In this paper, we present the kernel Kalman rule (KKR) as an alternative to the KBR. The derivation of the KKR is based on recursive least squares, inspired by the derivation of the Kalman innovation update. We apply the KKR to filtering tasks where we use RKHS embeddings to represent the belief state, resulting in the kernel Kalman filter (KKF). We show on a nonlinear state estimation task with high dimensional observations that our approach provides a significantly improved estimation accuracy while the computational demands are significantly decreased.

      @inproceedings{lirolem26739,
      booktitle = {Thirty-First AAAI Conference on Artificial Intelligence},
      year = {2017},
      publisher = {AAAI},
      author = {G. H. W. Gebhardt and A. Kupcsik and G. Neumann},
      month = {February},
      title = {The kernel Kalman rule: efficient nonparametric inference with recursive least squares},
      url = {http://eprints.lincoln.ac.uk/26739/},
      keywords = {ARRAY(0x55fe0aaf44e8)},
      abstract = {Nonparametric inference techniques provide promising tools
      for probabilistic reasoning in high-dimensional nonlinear systems.
      Most of these techniques embed distributions into reproducing
      kernel Hilbert spaces (RKHS) and rely on the kernel
      Bayes? rule (KBR) to manipulate the embeddings. However,
      the computational demands of the KBR scale poorly
      with the number of samples and the KBR often suffers from
      numerical instabilities. In this paper, we present the kernel
      Kalman rule (KKR) as an alternative to the KBR. The derivation
      of the KKR is based on recursive least squares, inspired
      by the derivation of the Kalman innovation update. We apply
      the KKR to filtering tasks where we use RKHS embeddings
      to represent the belief state, resulting in the kernel Kalman filter
      (KKF). We show on a nonlinear state estimation task with
      high dimensional observations that our approach provides a
      significantly improved estimation accuracy while the computational
      demands are significantly decreased.}
      }