A new analytical approach to consistency and overfitting in regularized empirical risk minimization

NICOLÁS GARCÍA TRILLOS; RYAN MURRAY

doi:10.1017/S0956792517000201

A new analytical approach to consistency and overfitting in regularized empirical risk minimization

Part of: Nonparametric inference Geometric probability and stochastic geometry Discrete mathematics in relation to computer science Existence theories

Published online by Cambridge University Press: 20 July 2017

NICOLÁS GARCÍA TRILLOS and

RYAN MURRAY

Show author details

NICOLÁS GARCÍA TRILLOS: Affiliation:
Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA email: nicolas_garcia_trillos@brown.edu
RYAN MURRAY: Affiliation:
Mathematics Department, The Pennsylvania State University, University Park, PA 16802, USA email: rwm22@psu.edu

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This work considers the problem of binary classification: given training data x 1, . . ., x n from a certain population, together with associated labels y 1,. . ., y n ∈ {0,1}, determine the best label for an element x not among the training data. More specifically, this work considers a variant of the regularized empirical risk functional which is defined intrinsically to the observed data and does not depend on the underlying population. Tools from modern analysis are used to obtain a concise proof of asymptotic consistency as regularization parameters are taken to zero at rates related to the size of the sample. These analytical tools give a new framework for understanding overfitting and underfitting, and rigorously connect the notion of overfitting with a loss of compactness.

Keywords

Overfitting empirical risk minimization graph total variation discrete to continuum classification

MSC classification

Primary: 49J55: Problems involving randomness

Secondary: 49J45: Methods involving semicontinuity and convergence; relaxation 60D05: Geometric probability and stochastic geometry 68R10: Graph theory (including graph drawing) 62G20: Asymptotic properties

Information

Type: Papers
Information: European Journal of Applied Mathematics , Volume 28 , Special Issue 6: Big Data and Partial Differential Equations , December 2017 , pp. 886 - 921

DOI: https://doi.org/10.1017/S0956792517000201 [Opens in a new window]
Creative Commons: This is a work of the U.S. Government and is not subject to copyright protection in the United States.
Copyright: Copyright © Cambridge University Press 2017

References

[1] Agapiou, S., Larsson, S. & Stuart, A. M. (2013) Posterior contraction rates for the Bayesian approach to linear ill-posed inverse problems. Stoch. Process. Appl. 123, 3828–3860.Google Scholar

[2] Ambrosio, L., Fusco, N. & Pallara, D. (2000) Functions of Bounded Variation and Free Discontinuity Problems, Oxford Mathematical Monographs, The Clarendon Press Oxford University Press, New York.Google Scholar

[3] Ambrosio, L., Gigli, N. & Savaré, G. (2008) Gradient Flows: In Metric Spaces and in the Space of Probability Measures, Lectures in Mathematics, Birkhäuser, Basel.Google Scholar

[4] Billingsley, P. (2012) Probability and Measure, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ.Google Scholar

[5] Chambolle, A., Caselles, V., Cremers, D., Novaga, M. & Pock, T. (2010) An introduction to total variation for image analysis. In: Fornasier, Massimo (editor), Theoretical Foundations and Numerical Methods for Sparse Recovery, Radon Ser. Comput. Appl. Math., vol. 9, Walter de Gruyter, Berlin, pp. 263–340.Google Scholar

[6] Dal Maso, G. (1993) An Introduction to Γ-Convergence, Springer, Birkhäuser Boston.Google Scholar

[7] Di Nezza, E., Palatucci, G. & Valdinoci, E. (2012) Hitchhiker's guide to the fractional Sobolev spaces. Bull. Sci. Math. 136, 521–573.Google Scholar

[8] Esser, E. (2009) Applications of Lagrangian Based Alternating Direction Methods and Connections to Split Bregman, CAM Report 09-31, UCLA.Google Scholar

[9] Evans, L. C. (1990) Weak Convergence Methods for Nonlinear Partial Differential Equations, vol. 74, American Mathematical Soc, Providence, RI.Google Scholar

[10] Fonseca, I. & Leoni, G. (2007) Modern Methods in the Calculus of Variations: L^p Spaces, Springer Monographs in Mathematics, Springer, New York.Google Scholar

[11] García Trillos, N., Gerlach, M., Hein, M. & Slepčev, D. (2017) Spectral convergence of emprical graph laplacians. In preparation.Google Scholar

[12] García Trillos, N. & Slepčev, D. (2016) Continuum limit of total variation on point clouds. Arch. Ration. Mech. Anal. 220 (1), 193–241.Google Scholar

[13] Garcia Trillos, N. & Slepcev, D. (2015) On the rate of convergence of empirical measures in ∞-transportation distance. Canad. J. Math. 67, 1358–1383.Google Scholar

[14] García Trillos, N., Slepčev, D., von Brecht, J., Laurent, T. & Bresson, X. (2016) Consistency of Cheeger and ratio graph cuts. to appear in J. Mach. Learning Res. 17, Paper No. 181, 46 pages.Google Scholar

[15] Ghosal, S., Ghosh, J. K. & van der Vaart, A. W. (2000) Convergence rates of posterior distributions. Ann. Statist. 28, 500–531.Google Scholar

[16] Grisvard, P. (1985) Elliptic Problems in Nonsmooth Domains, Monographs and Studies in Mathematics, vol. 24, Pitman (Advanced Publishing Program), Boston, MA.Google Scholar

[17] Leoni, G. (2009) A First Course in Sobolev Spaces, Graduate Studies in Mathematics, vol. 24, American Mathematical Society, Providence, RI.Google Scholar

[18] Nikolova, M. (2004) A variational approach to remove outliers and impulse noise. J. Math. Imaging Vision 20, 99–120.Google Scholar

[19] Pedregal, P. (1997) Parametrized Measures and Variational Principles, Progress in Nonlinear Differential Equations and their Applications, vol. 30, Birkhäuser Verlag, Basel.CrossRef Google Scholar

[20] Rockafellar, R. T. (1970) Convex Analysis, Princeton Mathematical Series, vol. 28, Princeton University Press, Princeton, NJ.Google Scholar

[21] Vapnik, V. N. (1998) Statistical Learning Theory, vol. 1, John Wiley & Sons, Inc., New York.Google Scholar

[22] Villani, C. (2003) Topics in Optimal Transportation, Graduate Studies in Mathematics, vol. 58, American Mathematical Society, Providence, RI.Google Scholar

[23] Vogel, C. R. & Oman, M. E. (1996) Iterative methods for total variation denoising. SIAM J. Sci. Comput. 17, 227–238.Google Scholar

[24] von Luxburg, U. & Schölkopf, B. (2011) Statistical learning theory: Models, concepts, and results. In: Gabbay, Dov M., Hartmann, Stephan and Woods, John (editors), Handbook of the History of Logic, Vol. 10: Inductive Logic Elsevier, North Holland, pp. 651–706.Google Scholar

Article contents

A new analytical approach to consistency and overfitting in regularized empirical risk minimization

Abstract

Keywords

MSC classification

Information

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests