CA | ES | EN

Efficient and convergent natural gradient based optimization algorithms for machine learning

Borja

Sánchez-López

(

22/Dec/2022

)Efficient and convergent natural gradient based optimization algorithms for machine learning

An industrial PhD

Advisors:

Jesus Cerquides

University:

Abstract:

Many times Machine Learning (ML) is casted as an optimization problem. This is the case

when an objective function assesses the success of an agent in a certain task and hence, learning

is accomplished by optimizing that function. Furthermore, gradient descent is an optimization

algorithm that has proven to be a powerful tool, becoming the cornerstone to solving most ML

challenges. Among its strengths, there are the low computational complexity and the conver-

gence guarantee property to the optimum of the function, after certain regularities on the func-

tion. Nevertheless, large dimension scenarios show sudden drops in convergence rates which

inhibit further improvements in an acceptable amount of time. For this reason, the field has

contemplated the natural gradient to tackle this issue.

The natural gradient is defined on a Riemannian manifold (M, g). A Riemannian manifold

is a manifold M equipped with a metric g. The natural gradient vector of a function f at a point

p in (M, g) is a vector in the tangent space at p that points to the direction in which f locally

increases its value faster taking into account the metric attached to the manifold. It turns out that

the manifold of probability distributions of the same family, usually considered in ML, has a

natural metric associated, namely the Fisher Information Metric (FIM). While natural gradient

based algorithms show a better convergence speed in some limited examples, they often fail in

providing good estimates or they even diverge. Moreover, they demand more calculations than

the ones performed by gradient descent algorithms, increasing the computational complexity

order.

This thesis explores the natural gradient descent algorithm for the function optimization

task. Our research aims at designing a natural gradient based algorithm to solve a function

optimization problem, whose computational complexity is comparable to those gradient based

and such that it benefits from higher rates of convergence compared to standard gradient based

methods.

To reach our objectives, the hypothesis formulated in this thesis is that the convergence

property guarantee stabilizes natural gradient algorithms and it gives access to fast rates of

convergence. Furthermore, the natural gradient can be computed fast for particular manifolds

named Dually Flat Manifold (DFM), and hence, fast natural gradient optimization methods

become available.

The beginning of our research is mainly focused on the convergence property for natural

gradient methods. We develop some strategies to define natural gradient methods whose con-

vergence can be proven. The main assumptions require (M, g) to be a Riemannian manifold

and f to be a differentiable function on M. Moreover, it turns out that the Multinomial Logistic

Regression (MLR) problem, a widely considered ML problem, can be adapted and solved by

taking a DFM as the model. Hence, this problem is our most promising target in which the

objective of the thesis can be completely accomplished.