The view of optimal control as probabilistic inference is being rediscovered again and again in planning and reinforcement learning. It has recently gained interest with the use of deep learning to represent policies and value functions, and the widespread use of entropy regularization in reinforcement learning. This seminar will introduce and explain the class of Kullback-Leibler control problems (also known as linearly-solvable optimal control) and define its relation with entropy-regularized reinforcement learning. I will present the discrete and continuous formulation of this framework for control and inference and present recent advances that exploit its analytical properties for efficient policy optimization. These advances lead to a practical algorithm in the reinforcement learning setting that can be applied to high-dimensional robotics tasks, addressing the main challenge of translating this theory into practical methods for large-scale control problems.

References:
Adaptive Smoothing for Path Integral Control
Dominik Thalmeier, Hilbert J. Kappen, Simone Totaro, Vicenç Gómez; 21(191):1−37, 2020.

Vicenç Gómez received the Computer Science engineering degree in 2002 from the Universitat Politècnica de Catalunya, and the PhD in Computer Science and Digital Communication from the Universitat Pompeu Fabra (UPF), Barcelona in 2008. He has been a postdoctoral researcher at the Radboud university medical center (2009–2011) and at the Donders Institute for Brain, Cognition and Behavior (2011–2014) in Nijmegen (The Netherlands). He has held visiting appointments in Los Alamos National Laboratory (USA), the IAS group at Technische Universitaet Darmstadt (Germany), and at University College London (UK). In 2014 he obtained a transnational academic career grant (FP7 Marie Curie Actions) and joined the Artificial Intelligence and Machine Learning group at the Department of Information and Communications Technologies (UPF). In 2016 he was awarded with a Ramon y Cajal fellowship. He is currently a tenure-track professor at UPF. His main research interests are machine learning and optimal control in applications to different areas such as complex networks, robotics, and brain computer interfaces.

IIIA-CSIC. CAMPUS DE LA UAB, E-08193. BELLATERRA, CATALONIA (SPAIN)

Phone. (+34) 93 580 9570 − Fax. (+34) 93 580 9661

© 2020 IIIA | All Rights reserved

Privacy Policy | Cookie Policy