Skip to main content

A new analytical approach to consistency and overfitting in regularized empirical risk minimization


This work considers the problem of binary classification: given training data x 1, . . ., x n from a certain population, together with associated labels y 1,. . ., y n ∈ {0,1}, determine the best label for an element x not among the training data. More specifically, this work considers a variant of the regularized empirical risk functional which is defined intrinsically to the observed data and does not depend on the underlying population. Tools from modern analysis are used to obtain a concise proof of asymptotic consistency as regularization parameters are taken to zero at rates related to the size of the sample. These analytical tools give a new framework for understanding overfitting and underfitting, and rigorously connect the notion of overfitting with a loss of compactness.

Hide All
[1] Agapiou, S., Larsson, S. & Stuart, A. M. (2013) Posterior contraction rates for the Bayesian approach to linear ill-posed inverse problems. Stoch. Process. Appl. 123, 38283860.
[2] Ambrosio, L., Fusco, N. & Pallara, D. (2000) Functions of Bounded Variation and Free Discontinuity Problems, Oxford Mathematical Monographs, The Clarendon Press Oxford University Press, New York.
[3] Ambrosio, L., Gigli, N. & Savaré, G. (2008) Gradient Flows: In Metric Spaces and in the Space of Probability Measures, Lectures in Mathematics, Birkhäuser, Basel.
[4] Billingsley, P. (2012) Probability and Measure, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ.
[5] Chambolle, A., Caselles, V., Cremers, D., Novaga, M. & Pock, T. (2010) An introduction to total variation for image analysis. In: Fornasier, Massimo (editor), Theoretical Foundations and Numerical Methods for Sparse Recovery, Radon Ser. Comput. Appl. Math., vol. 9, Walter de Gruyter, Berlin, pp. 263340.
[6] Dal Maso, G. (1993) An Introduction to Γ-Convergence, Springer, Birkhäuser Boston.
[7] Di Nezza, E., Palatucci, G. & Valdinoci, E. (2012) Hitchhiker's guide to the fractional Sobolev spaces. Bull. Sci. Math. 136, 521573.
[8] Esser, E. (2009) Applications of Lagrangian Based Alternating Direction Methods and Connections to Split Bregman, CAM Report 09-31, UCLA.
[9] Evans, L. C. (1990) Weak Convergence Methods for Nonlinear Partial Differential Equations, vol. 74, American Mathematical Soc, Providence, RI.
[10] Fonseca, I. & Leoni, G. (2007) Modern Methods in the Calculus of Variations: Lp Spaces, Springer Monographs in Mathematics, Springer, New York.
[11] García Trillos, N., Gerlach, M., Hein, M. & Slepčev, D. (2017) Spectral convergence of emprical graph laplacians. In preparation.
[12] García Trillos, N. & Slepčev, D. (2016) Continuum limit of total variation on point clouds. Arch. Ration. Mech. Anal. 220 (1), 193241.
[13] Garcia Trillos, N. & Slepcev, D. (2015) On the rate of convergence of empirical measures in ∞-transportation distance. Canad. J. Math. 67, 13581383.
[14] García Trillos, N., Slepčev, D., von Brecht, J., Laurent, T. & Bresson, X. (2016) Consistency of Cheeger and ratio graph cuts. to appear in J. Mach. Learning Res. 17, Paper No. 181, 46 pages.
[15] Ghosal, S., Ghosh, J. K. & van der Vaart, A. W. (2000) Convergence rates of posterior distributions. Ann. Statist. 28, 500531.
[16] Grisvard, P. (1985) Elliptic Problems in Nonsmooth Domains, Monographs and Studies in Mathematics, vol. 24, Pitman (Advanced Publishing Program), Boston, MA.
[17] Leoni, G. (2009) A First Course in Sobolev Spaces, Graduate Studies in Mathematics, vol. 24, American Mathematical Society, Providence, RI.
[18] Nikolova, M. (2004) A variational approach to remove outliers and impulse noise. J. Math. Imaging Vision 20, 99120.
[19] Pedregal, P. (1997) Parametrized Measures and Variational Principles, Progress in Nonlinear Differential Equations and their Applications, vol. 30, Birkhäuser Verlag, Basel.
[20] Rockafellar, R. T. (1970) Convex Analysis, Princeton Mathematical Series, vol. 28, Princeton University Press, Princeton, NJ.
[21] Vapnik, V. N. (1998) Statistical Learning Theory, vol. 1, John Wiley & Sons, Inc., New York.
[22] Villani, C. (2003) Topics in Optimal Transportation, Graduate Studies in Mathematics, vol. 58, American Mathematical Society, Providence, RI.
[23] Vogel, C. R. & Oman, M. E. (1996) Iterative methods for total variation denoising. SIAM J. Sci. Comput. 17, 227238.
[24] von Luxburg, U. & Schölkopf, B. (2011) Statistical learning theory: Models, concepts, and results. In: Gabbay, Dov M., Hartmann, Stephan and Woods, John (editors), Handbook of the History of Logic, Vol. 10: Inductive Logic Elsevier, North Holland, pp. 651706.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

European Journal of Applied Mathematics
  • ISSN: 0956-7925
  • EISSN: 1469-4425
  • URL: /core/journals/european-journal-of-applied-mathematics
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Full text views

Total number of HTML views: 0
Total number of PDF views: 27 *
Loading metrics...

Abstract views

Total abstract views: 134 *
Loading metrics...

* Views captured on Cambridge Core between 20th July 2017 - 19th March 2018. This data will be updated every 24 hours.