ICML 2016: Model-Free Trajectory Optimization for Reinforcement Learning

Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy.
In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

  • R. Akrour, A. Abdolmaleki, H. Abdulsamad, and G. Neumann, “Model-free trajectory optimization for reinforcement learning,” in Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 4342-4352.
    [BibTeX] [Abstract] [Download PDF]

    Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

    @inproceedings{lirolem25747,
    author = {R. Akrour and A. Abdolmaleki and H. Abdulsamad and G. Neumann},
    volume = {6},
    month = {June},
    journal = {33rd International Conference on Machine Learning, ICML 2016},
    year = {2016},
    booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
    title = {Model-free trajectory optimization for reinforcement learning},
    pages = {4342--4352},
    url = {http://eprints.lincoln.ac.uk/25747/},
    abstract = {Many of the recent Trajectory Optimization algorithms
    alternate between local approximation
    of the dynamics and conservative policy update.
    However, linearly approximating the dynamics
    in order to derive the new policy can bias the update
    and prevent convergence to the optimal policy.
    In this article, we propose a new model-free
    algorithm that backpropagates a local quadratic
    time-dependent Q-Function, allowing the derivation
    of the policy update in closed form. Our policy
    update ensures exact KL-constraint satisfaction
    without simplifying assumptions on the system
    dynamics demonstrating improved performance
    in comparison to related Trajectory Optimization
    algorithms linearizing the dynamics.},
    keywords = {ARRAY(0x55a2cda752d8)}
    }