PREDICTION/ESTIMATION WITH SIMPLE LINEAR MODELS: IS IT REALLY THAT SIMPLE?

Yuhong Yang

doi:10.1017/S0266466607070016

PREDICTION/ESTIMATION WITH SIMPLE LINEAR MODELS: IS IT REALLY THAT SIMPLE?

Published online by Cambridge University Press: 06 December 2006

Yuhong Yang

Show author details

Yuhong Yang: Affiliation:
University of Minnesota

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and information criteria can be used for identifying the right model. We compare the performances of such methods both theoretically and empirically from different perspectives for more insight. The testing approach at the conventional size of 0.05, in spite of being the “standard approach,” performs poorly in estimation. We also found that the frequently told story “the Bayesian information criterion (BIC) is good when the true model is finite-dimensional, and the Akaike information criterion (AIC) is good when the true model is infinite-dimensional” is far from being accurate. In addition, despite some successes in the effort to go beyond the debate between AIC and BIC by adaptive model selection, it turns out that it is not possible to share the pointwise adaptation property of BIC and the minimax-rate adaptation property of AIC by any model selection method. When model selection methods have difficulty in selection, model combining is a better alternative in terms of estimation accuracy.This work was completed when the author was on leave from Iowa State University and was a New Direction Visiting Professor at the Institute for Mathematics and its Applications (IMA) at the University of Minnesota. The fundings from both IMA and ISU are greatly appreciated. The work was also partly supported by NSF CAREER grant DMS0094323. The author thanks Xiaotong Shen and Hannes Leeb for very helpful discussions. The paper also benefited from the questions and comments from the participants at the statistics seminars the author gave at the University of Minnesota and Duke University. The author is very grateful to the anonymous reviewers and the co-editor Benedikt Pötscher for carefully reading earlier versions of the paper, bringing my attention to several closely related previous and current results, and making many very valuable suggestions, which significantly improved the paper in both content and presentation.

Information

Type: Research Article
Information: Econometric Theory , Volume 23 , Issue 1 , February 2007 , pp. 1 - 36

DOI: https://doi.org/10.1017/S0266466607070016 [Opens in a new window]
Copyright: © 2007 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

REFERENCES

Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. In B.N. Petrov & F. Csaki (eds.), Proceedings of the 2nd International Symposium on Information Theory, pp. 267–281. Akademia Kiado.

Bancroft, T.A. (1944) On biases in estimation due to the use of preliminary tests of significance. Annals of Mathematical Statistics 15, 190–204.Google Scholar

Barron, A.R., L. Birgé, & P. Massart (1999) Risk bounds for model selection via penalization. Probability Theory and Related Fields 113, 301–413.Google Scholar

Barron, A.R. & T.M. Cover (1991) Minimum complexity density estimation. IEEE Transactions on Information Theory 37, 1034–1054.Google Scholar

Barron, A.R., Y. Yang, & B. Yu (1994) Asymptotically optimal function estimation by minimum complexity criteria. In Proceedings of the 1994 International Symposium on Information Theory, p. 38.

Birgé, L. & P. Massart (2001) Gaussian model selection. Journal of the European Mathematical Society 3, 203–268.Google Scholar

Brown, L.D., M.G. Low, & L.H. Zhao (1997) Superefficiency in nonparametric function estimation. Annals of Statistics 25, 2607–2625.Google Scholar

Breiman, L. (1996) Bagging predictors. Machine Learning 24, 123–140.Google Scholar

Buckland, S.T., K.P. Burnham, & N.H. Augustin (1997) Model selection: An integral part of inference. Biometrics 53, 603–618.Google Scholar

Burnham, K.P. & D.R. Anderson (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag.

Danilov, D. & J.R. Magnus (2004) On the harm that ignoring pretesting can cause. Journal of Econometrics 122, 27–46.Google Scholar

Foster, D.P. & E.I. George (1994) The risk inflation criterion for multiple regression. Annals of Statistics 22, 1947–1975.Google Scholar

George, E.I. & D.P. Foster (2000) Calibration and empirical Bayes variable selection. Biometrika 87, 731–747.Google Scholar

Geweke, J. & R. Meese (1981) Estimating regression models of finite but unknown order. International Economic Review 22, 54–70.Google Scholar

Giles, J.A. & D.E.A. Giles (1993) Pre-test estimation and testing in econometrics: Recent developments. Journal of Economic Surveys 7, 145–197.Google Scholar

Hannan, E.J. & B.G. Quinn (1979) The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B 41, 190–195.Google Scholar

Hansen, M. & B. Yu (1999) Bridging AIC and BIC: An MDL model selection criterion. In Proceedings of IEEE Information Theory Workshop on Detection, Estimation, Classification and Imaging, p. 63. IEEE.

Hoeting, J.A., D. Madigan, A.E. Raftery, & C.T. Volinsky (1999) Bayesian model averaging: A tutorial. Statistical Science (with discussions) 14, 382–417.Google Scholar

Johnson, R.W. (1996) Fitting percentage of body fat to simple body measurements. Journal of Statistics Education 4, available at http://www.amstat.org/publications/jse/v4n1/datasets.johnson.html.Google Scholar

Judge, G.G. & M.E. Bock (1978) The Statistical Implications of Pre-test and Stein-Rule Estimators in Econometrics. Elsevier/North-Holland.

Judge, G.G. & T.A. Yancey (1986) Improved Methods of Inference in Econometrics. Elsevier/North-Holland.

Kabaila, P. (2002) On variable selection in linear regression. Econometric Theory 18, 913–925.Google Scholar

Leeb, H. (2005) The distribution of a linear predictor after model selection: Conditional finite-sample distributions and asymptotic approximations. Journal of Statistical Planning and Inference 134, 64–89.Google Scholar

Leeb, H. & B. Pötscher (2003) The finite-sample model distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19, 100–142.Google Scholar

Leeb, H. & B. Pötscher (2005) Model selection and inference: Facts and fiction. Econometric Theory 21, 21–59.Google Scholar

Li, K.C. (1987) Asymptotic optimality for C_p, C_L, cross-validation and generalized cross-validation: Discrete index set. Annals of Statistics 15, 958–975.Google Scholar

Magnus, J.R. (1999) The traditional pretest estimator. Theory of Probability and Its Applications 44, 293–308.Google Scholar

Magnus, J.R. (2002) Estimation of the mean of a univariate normal distribution with known variance. Econometrics Journal 5, 225–236.Google Scholar

Magnus, J.R. & J. Durbin (1999) Estimation of regression coefficients of interest when other regression coefficients are of no interest. Econometrica 67, 639–643.Google Scholar

Nishii, R. (1984) Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics 12, 758–765.Google Scholar

Penrose, K., A. Nelson, & A. Fisher (1985) Generalized body composition prediction equation for men using simple measurement techniques (abstract). Medicine and Science in Sports and Exercise 17, 189.Google Scholar

Pollard, D. (2002) A User's Guide to Measure Theoretic Probability. Cambridge University Press.

Polyak, B.T. & A.B. Tsybakov (1991) Asymptotic optimality of the C_p-test for the orthogonal series estimation of regression. Theory of Probability and Its Applications (translation of Teorija Verojatnostei i ee Primenenija) 35, 293–306.Google Scholar

Pötscher, B. (1989) Model selection under nonstationarity: Autoregressive models and stochastic linear regression models. Annals of Statistics 17, 1257–1274.Google Scholar

Pötscher, B. (1991) Effects of model selection on inference. Econometric Theory 7, 163–185.Google Scholar

Rao, C.R. & Y. Wu (1989) A strongly consistent procedure for model selection in a regression problem. Biometrika 76, 369–374.Google Scholar

Rao, J.S. & R. Tibshirani (1997) Comment on “An asymptotic theory for linear model selection.” Statistica Sinica 7, 249–251.Google Scholar

Rissanen, J. (1978) Modeling by shortest data description. Automatica 14, 465–471.Google Scholar

Rissanen, J. (1986) Stochastic complexity and modeling. Annals of Statistics 14, 1080–1100.Google Scholar

Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6, 461–464.Google Scholar

Shao, J. (1997) An asymptotic theory for linear model selection (with discussion). Statistica Sinica 7, 221–242.Google Scholar

Shen, X. & J. Ye (2002) Adaptive model selection. Journal of the American Statistical Association 97, 210–221.Google Scholar

Shibata, R. (1983) Asymptotic mean efficiency of a selection of regression variables. Annals of the Institute of Statistical Mathematics 35, 415–423.Google Scholar

Speed, T.P. & B. Yu (1993) Model selection and prediction: Normal regression. Annals of the Institute of Statistical Mathematics 45, 35–54.Google Scholar

Toro-Vizcarrondo, C. & T.D. Wallace (1968) A test of the mean square error criterion for restriction in linear regression. Journal of the American Statistical Association 63, 558–572.Google Scholar

Wallace, T.D. (1972) Weaker criteria and tests for linear restrictions in regression. Econometrica 40, 689–698.Google Scholar

Yang, Y. (1999) Model selection for nonparametric regression. Statistica Sinica 9, 475–499.Google Scholar

Yang, Y. (2001) Adaptive regression by mixing. Journal of the American Statistical Association 96, 574–588.Google Scholar

Yang, Y. (2003) Regression with multiple candidate models: Selecting or mixing? Statistica Sinica 13, 783–809.Google Scholar

Yang, Y. (2004) Aggregating regression procedures for a better performance. Bernoulli 10, 25–47.Google Scholar

Yang, Y. (2005) Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92, 937–950.Google Scholar

Yang, Y. & A.R. Barron (1998) An asymptotic property of model selection criteria. IEEE Transactions on Information Theory 44, 95–116.Google Scholar

Yuan, Z. & Y. Yang (2005) Combining linear regression models: When and how? Journal of the American Statistical Association 100, 1202–1214.Google Scholar

Zhang, P. (1997) Comment on “An asymptotic theory for linear model selection.” Statistica Sinica 7, 254–258.Google Scholar

Article contents

PREDICTION/ESTIMATION WITH SIMPLE LINEAR MODELS: IS IT REALLY THAT SIMPLE?

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests