COMBINING FORECASTING PROCEDURES: SOME THEORETICAL RESULTS

Yuhong Yang

doi:10.1017/S0266466604201086

COMBINING FORECASTING PROCEDURES: SOME THEORETICAL RESULTS

Published online by Cambridge University Press: 05 March 2004

Yuhong Yang

Show author details

Yuhong Yang: Affiliation:
Iowa State University

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We study some methods of combining procedures for forecasting a continuous random variable. Statistical risk bounds under the square error loss are obtained under distributional assumptions on the future given the current outside information and the past observations. The risk bounds show that the combined forecast automatically achieves the best performance among the candidate procedures up to a constant factor and an additive penalty term. In terms of the rate of convergence, the combined forecast performs as well as if the best candidate forecasting procedure were known in advance.

Empirical studies suggest that combining procedures can sometimes improve forecasting accuracy over the original procedures. Risk bounds are derived to theoretically quantify the potential gain and price of linearly combining forecasts for improvement. The result supports the empirical finding that it is not automatically a good idea to combine forecasts. Indiscriminate combining can degrade performance dramatically as a result of the large variability in estimating the best combining weights. An automated combining method is shown in theory to achieve a balance between the potential gain and the complexity penalty (the price of combining), to take advantage (if any) of sparse combining, and to maintain the best performance (in rate) among the candidate forecasting procedures if linear or sparse combining does not help.This research was supported by U.S. National Security Agency Grant MDA9049910060 and U.S. National Science Foundation CAREER Grant DMS0094323. The author sincerely thanks three reviewers and Poti Giannakouros for their very valuable comments, which led to a substantial improvement of the paper.

Information

Type: Research Article
Information: Econometric Theory , Volume 20 , Issue 1 , February 2004 , pp. 176 - 222

DOI: https://doi.org/10.1017/S0266466604201086 [Opens in a new window]
Copyright: © 2004 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

REFERENCES

Armstrong, J.S. (1989) Combining forecasts: The end of the beginning or the beginning of the end? International Journal of Forecasting 5, 585–588.Google Scholar

Barron, A.R. (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory 39, 930–945.Google Scholar

Barron, A.R. (1994) Approximation and estimation bounds for artificial neural networks. Machine Learning 14, 115–133.Google Scholar

Barron, A.R., L. Birgé, & P. Massart (1999) Risk bounds for model selection via penalization. Probability Theory and Related Fields 113, 301–413.Google Scholar

Barron, A.R., J. Rissanen, & B. Yu (1998) The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory 44, 2743–2760.Google Scholar

Bates, J.M. & C.W.J. Granger (1969) The combination of forecasts. Operational Research Quarterly 20, 451–468.Google Scholar

Box, G.E.P. & G.M. Jenkins (1976) Time Series Analysis: Forecasting and Control, 2nd ed. Holden-Day.

Breiman, L. (1996a) Stacked regressions. Machine Learning 24, 49–64.Google Scholar

Breiman, L. (1996b) Bagging predictors. Machine Learning 24, 123–140.Google Scholar

Buckland, S.T., K.P. Burnham, & N.H. Augustin (1997) Model selection: An integral part of inference. Biometrics 53, 603–618.Google Scholar

Catoni, O. (1999) “Universal” Aggregation Rules with Exact Bias Bounds. Preprint no. 510, Laboratoire de Probabilites et Modeles Aleatoires, Université Paris VI & Université Paris VII.

Cesa-Bianchi, N., Y. Freund, D.P. Haussler, R. Schapire, & M.K. Warmuth (1997) How to use expert advice? Journal of the Association for Computing Machinery 44, 427–485.Google Scholar

Chatfield, C. (1995) Model uncertainty, data mining, and statistical inference (with discussion). Journal of the Royal Statistical Society, Series A 158, 419–466.Google Scholar

Clemen, R.T. (1989) Combining forecasts: A review and annotated bibliography. International Journal of Forecasting 5, 559–583.Google Scholar

Clemen, R.T., A.H. Murphy, & R.L. Winkler (1995) Screening probability forecasts: Contrasts between choosing and combining. International Journal of Forecasting 11, 133–145.Google Scholar

Clemen, R.T. & R.L. Winkler (1986) Combining economic forecasts. Journal of Business and Economic Statistics 4, 39–46.Google Scholar

Cover, T.M. (1965) Behavior of sequential predictors of binary sequences. In Transactions of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, and Random Processes, pp. 263–271. Publishing House of the Czechoslovak Academy of Sciences.

Csiszár, I. (1975) I-Divergence geometry of probability distributions and minimization problems. Annals of Probability 3, 146–158.Google Scholar

Dawid, A.P. (1984) Present position and potential developments: Some personal views. Statistical theory—The prequential approach (with discussion). Journal of the Royal Statistical Society, Series A 147, 278–292.Google Scholar

DeVore, R.A. & G.G. Lorentz (1993) Constructive Approximation. Springer.

Devroye, L.P. & T.J. Wagner (1980) Distribution-free consistency results in nonparametric discrimination and regression function estimation. Annals of Statistics 8, 231–239.Google Scholar

Donoho, D.L. & I.M. Johnstone (1994) Ideal denoising in an orthonormal basis chosen from a library of bases. C.R. Acad. Sci. Paris 319, 1317–1322.Google Scholar

Donoho, D.L. & I.M. Johnstone (1998) Minimax estimation via wavelet shrinkage. Annals of Statistics 26, 879–921.Google Scholar

Efromovich, S. (1999) How to overcome curse of long-memory? IEEE Transactions on Information Theory 45, 1735–1741.Google Scholar

Figlewski, S. & T. Urich (1983) Optimal aggregation of money supply forecasts: Accuracy, profitability, and market efficiency. Journal of Finance 28, 695–710.Google Scholar

Foster, D.P. (1991) Prediction in the worst case. Annals of Statistics 19, 1084–1090.Google Scholar

Genest, C. & J.V. Zidek (1986) Combining probability distributions: A critique and an annotated bibliography. Statistical Science 1, 114–148.Google Scholar

Gouriéroux, C. & A. Monfort (1992) Qualitative threshold ARCH models. Journal of Econometrics 52, 159–199.Google Scholar

Hall, P. & J.D. Hart (1990) Nonparametric regression with long-range dependence. Stochastic Processes and Their Applications 36, 339–351.Google Scholar

Haussler, D., J. Kivinen, & M.K. Warmuth (1998) Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory 44, 1906–1925.Google Scholar

Hoeting, J.A., D. Madigan, A.E. Raftery, & C.T. Volinsky (1999) Bayesian model averaging: A tutorial (with discussion). Statistical Science 14, 382–401.Google Scholar

Holt, C.C. (1957) Forecasting Seasonals and Trends by Exponentially Weighted Moving Averages. Carnegie Institute of Technology. NR Research Memorandum 52.

Johnstone, I. (1999) Function Estimation in Gaussian Noise: Sequence Models. Manuscript.

Johnstone, I. & B.W. Silverman (1997) Wavelet threshold estimators for data with correlated noise. Journal of the Royal Statistical Association, Series B 59, 319–351.Google Scholar

Juditsky, A. & A. Nemirovski (2000) Functional aggregation for nonparametric estimation. Annals of Statistics 28, 681–712.Google Scholar

Kang, H. (1986) Unstable weights in the combination of forecasts. Management of Science 32, 683–695.Google Scholar

Leamer, E.E. (1978) Specification Searches: Ad hoc Inference with Nonexperimental Data. Wiley.

LeBlanc, M. & R. Tibshirani (1996) Combining estimates in regression and classification. Journal of the American Statistical Association 91, 1641–1650.Google Scholar

Littlestone, N. & M.K. Warmuth (1994) The weighted majority algorithm. Information and Computation 108, 212–261.Google Scholar

Merhav, N. & M. Feder (1998) Universal prediction. IEEE Transactions on Information Theory 44, 2124–2147.Google Scholar

Newbold, P., & C.W.J. Granger (1974) Experience with forecasting univariate times series and the combination of forecasts. Journal of the Royal Statistical Society, Series A 137, 131–165 (with discussion).Google Scholar

Ploberger, W. & P.C.B. Phillips (1999) Empirical Limits for Time Series Econometric Models. Cowles Foundation Discussion paper 1220, Yale University.

Rissanen, J. (1986) Stochastic complexity and modeling. Annals of Statistics 14, 1080–1100.Google Scholar

Schütt, C. (1984) Entropy numbers of diagonal operators between symmetric Banach spaces. Journal of Approximation Theory 40, 121–128.Google Scholar

Stone, C.J. (1977) Consistent nonparametric regression. Annals of Statistics 5, 595–620.Google Scholar

Stone, C.J. (1985) Additive regression and other nonparametric models. Annals of Statistics 13, 689–705.Google Scholar

Taylor, J.W. & D.W. Bunn (1999) Investigating improvements in the accuracy of prediction intervals for combinations of forecasts: A simulation study. International Journal of Forecasting 15, 325–339.Google Scholar

Triebel, H. (1975) Interpolation properties of ε-entropy and diameters. Geometric characteristics of embedding for function spaces of Sobolev-Besov type. Mat. Sbornik 98, 27–41; English trans. in Math. USSR Sb. 27, 23–37, 1977.Google Scholar

Wang, Y. (1996) Function estimation via wavelet shrinkage for long-memory data. Annals of Statistics 24, 466–484.Google Scholar

Vovk, V.G. (1990) Aggregating strategies. In M. Fulk & J. Case (eds.), Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 372–383.

Winters, P.R. (1960) Forecasting sales by exponentially weighted moving averages. Management of Science 6, 324–342.Google Scholar

Wolpert, D. (1992) Stacked generalization. Neural Networks 5, 241–259.Google Scholar

Yang, Y. (1997) Nonparametric Regression and Prediction with Dependent Errors. Technical Report 29, Department of Statistics, Iowa State University. A shorter version appeared in Bernoulli 7, 633–655, 2001.

Yang, Y. (in press) Aggregating regression procedures for a better performance. Bernoulli, forthcoming.

Yang, Y. (2000a) Mixing strategies for density estimation. Annals of Statistics 28, 75–87.Google Scholar

Yang, Y. (2000b) Combining different procedures for adaptive regression. Journal of Multivariate Analysis 74, 135–161.Google Scholar

Yang, Y. (2000c) Adaptive estimation in pattern recognition by combining different procedures. Statistica Sinica 10, 1069–1089.Google Scholar

Yang, Y. (2001) Adaptive regression by mixing. Journal of the American Statistical Association 96, 574–588.Google Scholar

Yang, Y. (2003) Regression with multiple candidate models: Selecting or mixing? Statistica Sinica 13, 783–809.Google Scholar

Yang, Y. & A.R. Barron (1998) An asymptotic property of model selection criteria. IEEE Transactions on Information Theory 44, 95–116.Google Scholar

Yang, Y. & A.R. Barron (1999) Information-theoretic determination of minimax rates of convergence. Annals of Statistics 27, 1564–1599.Google Scholar

Zou, H. & Y. Yang (2003) Combining time series models for forecasting. International Journal of Forecasting, forthcoming.Google Scholar

Article contents

COMBINING FORECASTING PROCEDURES: SOME THEORETICAL RESULTS

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests