CA | ES | EN
Efficient and convergent natural gradient based optimization algorithms for machine learning
Efficient and convergent natural gradient based optimization algorithms for machine learning
Borja
Borja
 
Sánchez-López
Sánchez-López
 (
22/Dec/2022
22/Dec/2022
)
Efficient and convergent natural gradient based optimization algorithms for machine learning
Efficient and convergent natural gradient based optimization algorithms for machine learning
 

An industrial PhD

Advisors: 

Jesus Cerquides

Jesus Cerquides

University: 

Abstract: 

Many times Machine Learning (ML) is casted as an optimization problem. This is the case
when an objective function assesses the success of an agent in a certain task and hence, learning
is accomplished by optimizing that function. Furthermore, gradient descent is an optimization
algorithm that has proven to be a powerful tool, becoming the cornerstone to solving most ML
challenges. Among its strengths, there are the low computational complexity and the conver-
gence guarantee property to the optimum of the function, after certain regularities on the func-
tion. Nevertheless, large dimension scenarios show sudden drops in convergence rates which
inhibit further improvements in an acceptable amount of time. For this reason, the field has
contemplated the natural gradient to tackle this issue.
The natural gradient is defined on a Riemannian manifold (M, g). A Riemannian manifold
is a manifold M equipped with a metric g. The natural gradient vector of a function f at a point
p in (M, g) is a vector in the tangent space at p that points to the direction in which f locally
increases its value faster taking into account the metric attached to the manifold. It turns out that
the manifold of probability distributions of the same family, usually considered in ML, has a
natural metric associated, namely the Fisher Information Metric (FIM). While natural gradient
based algorithms show a better convergence speed in some limited examples, they often fail in
providing good estimates or they even diverge. Moreover, they demand more calculations than
the ones performed by gradient descent algorithms, increasing the computational complexity
order.
This thesis explores the natural gradient descent algorithm for the function optimization
task. Our research aims at designing a natural gradient based algorithm to solve a function
optimization problem, whose computational complexity is comparable to those gradient based
and such that it benefits from higher rates of convergence compared to standard gradient based
methods.
To reach our objectives, the hypothesis formulated in this thesis is that the convergence
property guarantee stabilizes natural gradient algorithms and it gives access to fast rates of
convergence. Furthermore, the natural gradient can be computed fast for particular manifolds
named Dually Flat Manifold (DFM), and hence, fast natural gradient optimization methods
become available.
The beginning of our research is mainly focused on the convergence property for natural
gradient methods. We develop some strategies to define natural gradient methods whose con-
vergence can be proven. The main assumptions require (M, g) to be a Riemannian manifold
and f to be a differentiable function on M. Moreover, it turns out that the Multinomial Logistic
Regression (MLR) problem, a widely considered ML problem, can be adapted and solved by
taking a DFM as the model. Hence, this problem is our most promising target in which the
objective of the thesis can be completely accomplished.

Many times Machine Learning (ML) is casted as an optimization problem. This is the case
when an objective function assesses the success of an agent in a certain task and hence, learning
is accomplished by optimizing that function. Furthermore, gradient descent is an optimization
algorithm that has proven to be a powerful tool, becoming the cornerstone to solving most ML
challenges. Among its strengths, there are the low computational complexity and the conver-
gence guarantee property to the optimum of the function, after certain regularities on the func-
tion. Nevertheless, large dimension scenarios show sudden drops in convergence rates which
inhibit further improvements in an acceptable amount of time. For this reason, the field has
contemplated the natural gradient to tackle this issue.
The natural gradient is defined on a Riemannian manifold (M, g). A Riemannian manifold
is a manifold M equipped with a metric g. The natural gradient vector of a function f at a point
p in (M, g) is a vector in the tangent space at p that points to the direction in which f locally
increases its value faster taking into account the metric attached to the manifold. It turns out that
the manifold of probability distributions of the same family, usually considered in ML, has a
natural metric associated, namely the Fisher Information Metric (FIM). While natural gradient
based algorithms show a better convergence speed in some limited examples, they often fail in
providing good estimates or they even diverge. Moreover, they demand more calculations than
the ones performed by gradient descent algorithms, increasing the computational complexity
order.
This thesis explores the natural gradient descent algorithm for the function optimization
task. Our research aims at designing a natural gradient based algorithm to solve a function
optimization problem, whose computational complexity is comparable to those gradient based
and such that it benefits from higher rates of convergence compared to standard gradient based
methods.
To reach our objectives, the hypothesis formulated in this thesis is that the convergence
property guarantee stabilizes natural gradient algorithms and it gives access to fast rates of
convergence. Furthermore, the natural gradient can be computed fast for particular manifolds
named Dually Flat Manifold (DFM), and hence, fast natural gradient optimization methods
become available.
The beginning of our research is mainly focused on the convergence property for natural
gradient methods. We develop some strategies to define natural gradient methods whose con-
vergence can be proven. The main assumptions require (M, g) to be a Riemannian manifold
and f to be a differentiable function on M. Moreover, it turns out that the Multinomial Logistic
Regression (MLR) problem, a widely considered ML problem, can be adapted and solved by
taking a DFM as the model. Hence, this problem is our most promising target in which the
objective of the thesis can be completely accomplished.