Hostname: page-component-77f85d65b8-lfk5g Total loading time: 0 Render date: 2026-03-27T08:40:58.991Z Has data issue: false hasContentIssue false

From NeurODEs to AutoencODEs: A mean-field control framework for width-varying neural networks

Published online by Cambridge University Press:  08 February 2024

Cristina Cipriani
Affiliation:
School of Computation, Information and Technology, Technical University Munich, Munich, Germany Munich Data Science Institute (MDSI), Munich, Germany Munich Center for Machine Learning (MCML), Munich, Germany
Massimo Fornasier
Affiliation:
School of Computation, Information and Technology, Technical University Munich, Munich, Germany Munich Data Science Institute (MDSI), Munich, Germany Munich Center for Machine Learning (MCML), Munich, Germany
Alessandro Scagliotti*
Affiliation:
School of Computation, Information and Technology, Technical University Munich, Munich, Germany Munich Center for Machine Learning (MCML), Munich, Germany
*
Corresponding author: Alessandro Scagliotti; Email: scag@ma.tum.de
Rights & Permissions [Opens in a new window]

Abstract

The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks, which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modelling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularisation, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularisation may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.

Information

Type
Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Left: network with an encoder structure. Right: autoencoder.

Figure 1

Figure 2. Left: embedding of an encoder into a dynamical system. Right: model for an autoencoder.

Figure 2

Figure 3. Embedding of the U-net into a higher-dimensional dynamical system.

Figure 3

Algorithm 1: Shooting method

Figure 4

Figure 4. Left: classification task performed when the turned off component is the natural one. Right: sketch of the AutoencODE architecture considered.

Figure 5

Table 1. Minimum and maximum eigenvalues of the Hessian matrix across epochs.

Figure 6

Figure 5. Left: initial phase, i.e., separation of the data along the $y$-axis. Center: encoding phase, i.e., only the second component is active. Right: decoding phase and classification result after the ‘unnatural turn off’. Notice that, for a nice clustering of the classified data, we have increased the number of layers from $20$ to $40$. However, we report that the network accomplishes the task even if we use the same structure as in Figure 4.

Figure 7

Figure 6. Top left: initial phase. Top rights: encoding phase. Bottom left: decoding phase. Bottom right: network’s reconstruction with alternative architecture.

Figure 8

Figure 7. Architecture used for the MNIST reconstruction task. The inactive nodes are marked in green.

Figure 9

Figure 8. Reconstruction of some numbers achieved by AutoencODE.

Figure 10

Figure 9. Left: comparing two similar latent means. Center: again two similar latent means. Right: mean and standard deviation of the encoded vectors in the bottleneck.

Figure 11

Figure 10. Left: wrong autoencoder detected with the analysis of $\Delta$. Right: correct version of the same autoencoder.

Figure 12

Figure 11. Left: Shannon entropy across layers. Right: our measure of entropy across layers.