Hostname: page-component-76fb5796d-wq484 Total loading time: 0 Render date: 2024-04-30T03:21:06.356Z Has data issue: false hasContentIssue false

Adaptive density estimation for clustering with Gaussian mixtures

Published online by Cambridge University Press:  04 November 2013

C. Maugis-Rabusseau
Affiliation:
Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, INSA de Toulouse, 135 avenue de Rangueil, 31077 Toulouse Cedex 4, France. cathy.maugis@insa-toulouse.fr
B. Michel
Affiliation:
Laboratoire de Statistique Théorique et Appliquée, Université Pierre et Marie Curie - Paris 6, 4 place Jussieu, 75252 Paris Cedex 05, France; bertrand.michel@upmc.fr
Get access

Abstract

Gaussian mixture models are widely used to study clustering problems. These model-based clustering methods require an accurate estimation of the unknown data density by Gaussian mixtures. In Maugis and Michel (2009), a penalized maximum likelihood estimator is proposed for automatically selecting the number of mixture components. In the present paper, a collection of univariate densities whose logarithm is locally β-Hölder with moment and tail conditions are considered. We show that this penalized estimator is minimax adaptive to the β regularity of such densities in the Hellinger sense.

Type
Research Article
Copyright
© EDP Sciences, SMAI, 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baudry, J.-P., Maugis, C. and Michel, B., Slope heuristics: overview and implementation. Stat. Comput. 22 (2011) 455470. Google Scholar
Birgé, L., A new lower bound for multiple hypothesis testing. IEEE Trans. Inform. Theory. 51 (2005) 16111615. Google Scholar
W. Cheney and W. Light, A course in approximation theory, Graduate Studies in Mathematics, vol. 101 of Amer. Math. Soc. Providence, RI (2009).
Ghosal, S., Ghosh, J.K. and Ramamoorthi, R.V., Posterior consistency of Dirichlet mixtures in density estimation. Ann. Stat. 27 (1999) 143158. Google Scholar
Ghosal, S. and van der Vaart, A., Entropy and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Stat. 29 (2001) 12331263,. Google Scholar
Ghosal, S. and van der Vaart, A., Posterior convergence rates of Dirichlet mixtures at smooth densities. Ann. Stat. 35 (2007) 697723. Google Scholar
U. Grenander, Abstract inference. John Wiley and Sons Inc., New York (1981).
Hangelbroek, T. and Ron, A., Nonlinear approximation using Gaussian kernels. J. Functional Anal. 259 (2010) 203219. Google Scholar
J.A. Hartigan, Clustering algorithms, Probab. Math. Stat. John Wiley and Sons, New York-London-Sydney (1975).
T. Hastie, R. Tibshirani and J. Friedman, The elements of statistical learning, Data mining, inference, and prediction. Statistics. Springer, New York, 2nd edition (2009).
Kruijer, W., Rousseau, J. and Vaart, A van der, Adaptive Bayesian Density Estimation with Location-Scale Mixtures. Electron. J. Statist. 4 (2010) 12251257. Google Scholar
B. Lindsay, Mixtures Models: Theory, Geometry and Applications. IMS, Hayward, CA (1995).
P. Massart, Concentration Inequalities and Model Selection. École d’été de Probabilités de Saint-Flour, 2003. Lect. Notes Math. Springer (2007).
C. Maugis and B. Michel, Adaptive density estimation for clustering with Gaussian mixtures (2011). arXiv:1103.4253v2.
Maugis, C. and Michel, B., Data-driven penalty calibration: a case study for Gaussian mixture model selection. ESAIM: PS 15 (2011) 320339. Google Scholar
Maugis, C. and Michel, B., A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: PS 15 (2011) 4168. Google Scholar
G. McLachlan and D. Peel, Finite Mixture Models. Wiley (2000).
A.B. Tsybakov, Introduction to nonparametric estimation. Statistics. Springer, New York (2009).
Wolfowitz, J., Minimax estimation of the mean of a normal distribution with known variance. Ann. Math. Stat. 21 (1950) 218230. Google Scholar