Hostname: page-component-6766d58669-tq7bh Total loading time: 0 Render date: 2026-05-15T13:17:42.398Z Has data issue: false hasContentIssue false

Analysing heavy-tail properties of stochastic gradient descent by means of stochastic recurrence equations

Published online by Cambridge University Press:  10 February 2026

Ewa Damek*
Affiliation:
University of Wrocław
Sebastian Mentemeier*
Affiliation:
University of Hildesheim
*
*Postal address: Uniwersytet Wrocławski, Instytut Matematyczny, pl. Grunwaldzki 2, 50-384 Wrocław, Poland. Email: ewa.damek@math.uni.wroc.pl
**Postal address: Universität Hildesheim, Institut für Mathematik – IMMI, Universitätsplatz 1, 31141 Hildesheim, Germany. Email: mentemeier@uni-hildesheim.de
Rights & Permissions [Opens in a new window]

Abstract

In recent works on the theory of machine learning, it has been observed that heavy tail properties of stochastic gradient descent (SGD) can be studied in the probabilistic framework of stochastic recursions. In particular, Gürbüzbalaban et al. (2021) considered a setup corresponding to linear regression for which iterations of SGD can be modelled by a multivariate affine stochastic recursion $X_n=A_nX_{n-1}+B_n$ for independent and identically distributed pairs $(A_n,B_n)$, where $A_n$ is a random symmetric matrix and $B_n$ is a random vector. However, their approach is not completely correct and, in the present paper, the problem is put into the right framework by applying the theory of irreducible-proximal matrices.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Applied Probability Trust
Figure 0

Figure 1. Contour plot of h as a function of b and s for model (Rank1Gauss) with $d=2$ and $\eta=0.75$. The black line is the contour of $k\equiv1$. The values of h have been cut at level 2 for better visualization.

Figure 1

Figure 2. Contour plot of h as a function of $\eta$ and s for model (Rank1Gauss) with $d=2$ and $b=5$. The black line is the contour of $k\equiv1$. The values of h have been cut at level 2 for better visualization.