Skip to main content Accessibility help

Automatic selection of reliability estimates for individual regression predictions

  • Zoran Bosnić (a1) and Igor Kononenko (a1)

In machine learning and its risk-sensitive applications (e.g. medicine, engineering, business), the reliability estimates for individual predictions provide more information about the individual prediction error (the difference between the true label and regression prediction) than the average accuracy of predictive model (e.g. relative mean squared error). Furthermore, they enable the users to distinguish between more and less reliable predictions. The empirical evaluations of the existing individual reliability estimates revealed that the successful estimates’ performance depends on the used regression model and on the particular problem domain. In the current paper, we focus on that problem as such and propose and empirically evaluate two approaches for automatic selection of the most appropriate estimate for a given domain and regression model: the internal cross-validation approach and the meta-learning approach. The testing results of both approaches demonstrated an advantage in the performance of dynamically chosen reliability estimates to the performance of the individual reliability estimates. The best results were achieved using the internal cross-validation procedure, where reliability estimates significantly positively correlated with the prediction error in 73% of experiments. In addition, the preliminary testing of the proposed methodology on a medical domain demonstrated the potential for its usage in practice.

Corresponding author
Hide All
Aha, D. W. 1992. Generalizing from case studies: A case study. In Proceedings of the Ninth International Workshop on Machine Learning (ML 1992), Aberdeen, Scotland, UK, 1–10.
Asuncion, A., Newman, D. J. 2007. UCI machine learning repository,, Irvine, CA: University of California, School of Information and Computer Science.
Birattari, M., Bontempi, H., Bersini, H. 1998. Local learning for data analysis. In Proceedings of the 8th Belgian-Dutch Conference on Machine Learning, Wageningen, The Netherlands, 55–61.
Blum, A., Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, Wisconsin, 92–100.
Bosnić, Z., Kononenko, I. 2007. Estimation of individual prediction reliability using the local sensitivity analysis. Applied Intelligence 29(3), 187203.
Bosnić, Z., Kononenko, I. 2008a. Estimation of regressor reliability. Journal of Intelligent Systems 17(1/3), 297311.
Bosnić, Z., Kononenko, I. 2008b. Comparison of approaches for estimating reliability of individual regression predictions. Data & Knowledge Engineering 67(3), 504516.
Bosnić, Z., Kononenko, I., Robnik-Šikonja, M., Kukar, M. 2003. Evaluation of prediction reliability in regression using the transduction principle. In Proceedings of Eurocon 2003, Zajc, B. & Tkalčič, M. (eds), 99103. IEEE (Institute of Electrical and Electronics Engineering, Inc.)
Bousquet, O., Elisseeff, A. 2002. Stability and generalization. Journal of Machine Learning Research 2, 499526.
Breierova, L., Choudhari, M. 1996. An introduction to sensitivity analysis. MIT System Dynamics in Education Project.
Breiman, L. 1996. Bagging predictors. Machine Learning 24(2), 123140.
Breiman, L. 2001. Random forests. Machine Learning 45(1), 532.
Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. 1984. Classification and Regression Trees. Wadsworth International Group.
Carney, J., Cunningham, P. 1999. Confidence and prediction intervals for neural network ensembles. In Proceedings of IJCNN’99, The International Joint Conference on Neural Networks, Washington, USA, 1215–1218.
Caruana, R. 1997. Multitask learning. Machine Learning 28(1), 4175.
Chang, C., Lin, C. 2001. LIBSVM: A Library for Support Vector Machines. Software available at
Christiannini, N., Shawe-Taylor, J. 2000. Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
Cohn, D. A., Atlas, L., Ladner, R. 1990. Training connectionist networks with queries and selective sampling. In Advances in Neural Information Processing Systems, Touretzky, D. (ed.) 2, 566573. Morgan Kaufman.
Cohn, D. A., Ghahramani, Z., Jordan, M. I. 1995. Active learning with statistical models. In Advances in Neural Information Processing Systems, Tesauro, G., Touretzky, D. & Leen, T. (eds) 7, 705712. The MIT Press.
Crowder, M. J., Kimber, A. C., Smith, R. L., Sweeting, T. J. 1991. Statistical Concepts in Reliability. Statistical Analysis of Reliability Data. Chapman & Hall.
de Sa, V. 1993. Learning classification with unlabeled data. In Proc. NIPS’93, Neural Information Processing Systems, Cowan, J. D., Tesauro, G. & Alspector, J. (eds), 112119. Morgan Kaufmann Publishers.
DesJardins, M., Gordon Diana, F. 1995. Evaluation and Selection of Biases in Machine Learning. Machine Learning 20, 522.
Department of Statistics at Carnegie Mellon University 2005. Statlib – Data, Software and News from the Statistics Community.
Elidan, G., Ninio, M., Friedman, N., Schuurmans, D. 2002. Data perturbation for escaping local maxima in learning. In Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, July 28 - August 1, 2002, Edmonton, Alberta, Canada, 132–139. AAAI Press.
Freund, Y., Schapire, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119139.
Gama, J., Brazdil, P. 1995. Characterization of classification algorithms. In Progress in Artificial Intelligence, 7th Portuguese Conference on Artificial Intelligence, EPIA-95, Pinto-Ferreira, C. & Mamede, N. (eds), 189–200. Springer-Verlag.
Gammerman, A., Vovk, V., Vapnik, V. 1998. Learning by transduction. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, Madison, Wisconsin, 148–155.
Giacinto, G., Roli, F. 2001. Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognition 34(9), 18791881.
Goldman, S., Zhou, Y. 2000. Enhancing supervised learning with unlabeled data. In Proc. 17th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA, 327–334.
Hastie, T., Tibshirani, R. 1990. Generalized Additive Models. Chapman and Hall.
Heskes, T. 1997. Practical confidence and prediction intervals. In Advances in Neural Information Processing Systems, Mozer, M. C., Jordan, M. I. & Petsche, T. (eds), 9, 176182. The MIT Press.
Jeon, B., Landgrebe, D. A. 1994. Parzen density estimation using clustering-based branch and bound. IEEE Transactions on Pattern Analysis and Machine Intelligence, 950954.
Kearns, M. J., Ron, D. 1997. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. In Computational Learning Theory, Freund Y. & Shapire R. (eds), 152162, Morgan Kaufmann.
Kleijnen, J. 2001. Experimental designs for sensitivity analysis of simulation models. Tutorial at the Eurosim 2001 Conference.
Kononenko, I., Kukar, M. 2007. Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood Publishing Limited.
Krieger, A. M., Green, P. E. 1999. A cautionary note on using internal cross validation to select the number of clusters. Psychometrika 64, 341353.
Kukar, M., Kononenko, I. 2002. Reliable classifications with machine learning. In Proc. Machine Learning: ECML-2002, Elomaa, T., Manilla, H. & Toivonen, H. (eds), 219231. Springer-Verlag.
Li, M., Vitányi, P. 1993. An Introduction to Kolmogorov Complexity and its Applications. Springer-Verlag.
Linden, A., Weber, F. 1992. Implementing inner drive by competence reflection. In Proceedings of the 2nd International Conference on Simulation of Adaptive Behavior, Hawaii, 321–326.
Merz, C. J. 1996. Dynamical selection of learning algorithms. In Learning from Data: Artificial Intelligence and Statistics, Fisher, D. & Lenz, H. J. (eds), 110. Springer-Verlag.
Michie, D., Spiegelhalter, D. J., Taylor, C. C. (eds) 1994. Analysis of results. In Machine Learning, Neural and Statistical Classification, 176212. Ellis Horwood.
Mitchell, T. 1999. The role of unlabelled data in supervised learning. In Proceedings of the 6th International Colloquium of Cognitive Science, San Sebastian, Spain.
Nouretdinov, I., Melluish, T., Vovk, V. 2001. Ridge regression confidence machine. In Proc. 18th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA, 385–392.
Pratt, L., Jennings, B. 1998. A survey of connectionist network reuse through transfer. Learning to Learn, Norwell, MA, USA, ISBN: 0-7923-8047-9, 1943. Kluwer Academic Publishers.
R Development Core Team 2006. A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
Rumelhart, D. E., Hinton, G. E., Williams, R. J. 1986. Learning Internal Representations by Error Propagation. MIT Press, 318–362.
Saunders, C., Gammerman, A., Vovk, V. 1999. Transduction with confidence and credibility. In Proceedings of IJCAI’99, 2, 722–726.
Schaal, S., Atkeson, C. G. 1994. Assessing the quality of learned local models. In Advances in Neural Information Processing Systems, Cowan, J. D., Tesauro, G. & Alspector, J. (eds), 160167. Morgan Kaufmann Publishers.
Schaal, S., Atkeson, C. G. 1998. Constructive incremental learning from only local information. Neural Computation 10(8), 20472084.
Schaffer, C. 1993. Selecting a classification method by cross-validation. In Fourth International Workshop on Artificial Intelligence & Statistics, 15–25.
Schmidhuber, J., Storck, J. 1993. Reinforcement Driven Information Acquisition in Nondeterministic Environments. Technical Report. Fakultat fur Informatik, Technische Universit at Munchen.
Schmidhuber, J, Zhao, J., Wiering, M. 1996. Simple principles of metalearning, Technical Report IDSIA-69-96, Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale, 1–23.
Seeger, M. 2000. Learning with Labeled and Unlabeled Data. Technical report.
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Probability. Chapman and Hall.
Smola, A. J., Schölkopf, B. 1998. A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report NC2-TR-1998-030.
Tibshirani, R., Knight, K. 1999. Model search and inference by bootstrap bumping. Journal of Computational and Graphical Statistics 8, 671686.
Torgo, L. 2003. Data Mining with R: Learning by Case Studies. University of Porto, LIACC-FEP.
Tsuda, K., Rätsch, G., Mika, S., Müller, K. 2001. Learning to predict the leave-one-out error of kernel based classifiers. In Lecture Notes in Computer Science, 227331. Springer Berlin/Heidelberg.
Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer.
Vilalta, R., Drissi, Y. 2002. A perspective view and survey of metalearning. Artificial Intelligence Review 18(2), 7795.
Wand, M. P., Jones, M. C. 1995. Kernel Smoothing. Chapman and Hall.
Weigend, A., Nix, D. 1994. Predictions with confidence intervals (local error bars). In Proceedings of the International Conference on Neural Information Processing (ICONIP’94), Seoul, Korea, 847–852.
Whitehead, S. D. 1991. A complexity analysis of cooperative mechanisms in reinforcement learning. In AAAI, 607–613.
Wolpert, D. H. 1992. Stacked generalization. In Neural Networks, Amari S. Grossberg S. & Taylor J. G. (eds) 5, 241259. Pergamon Press.
Wood, S. N. 2006. Generalized Additive Models: An Introduction with R, Chapman & Hall/CRC.
Woods, K., Kegelmeyer, W. P., Bowyer, K. 1997. Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on PAMI 19(4), 405410.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Knowledge Engineering Review
  • ISSN: 0269-8889
  • EISSN: 1469-8005
  • URL: /core/journals/knowledge-engineering-review
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed