References

Peter Flach

doi:10.1017/CBO9780511973000.017

References

Published online by Cambridge University Press: 05 November 2012

Peter Flach

Show author details

Peter Flach: Affiliation:
University of Bristol

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Information

Type: Chapter
Information: Machine Learning
The Art and Science of Algorithms that Make Sense of Data
, pp. 367 - 382

DOI: https://doi.org/10.1017/CBO9780511973000.017 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Abudawood, T. (2011). Multi-class subgroup discovery: Heuristics, algorithms and predictiveness. Ph.D. thesis, University of Bristol, Department of Computer Science, Faculty of Engineering. 357

Abudawood, T. and Flach, P.A. (2009). Evaluation measures for multi-class subgroup discovery. In W.L., Buntine, M., Grobelnik, D., Mladenić and J., Shawe-Taylor (eds.), Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2009), Part I, LNCS, volume 5781, pp. 35–50. Springer. 193Google Scholar

Agrawal, R., Imielinski, T. and Swami, A.N. (1993). Mining association rules between sets of items in large databases. In P., Buneman and S., Jajodia (eds.), Proceedings of the ACM International Conference on Management of Data (SIGMOD 1993), pp. 207–216. ACM Press. 103Google Scholar

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A.I. (1996). Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI/MIT Press. 193Google Scholar

Allwein, E.L., Schapire, R.E. and Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. In P., Langley (ed.), Proceedings of the Seventeenth In ternational Conference on Machine Learning (ICML 2000), pp. 9–16. Morgan Kaufmann. 102Google Scholar

Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9(7):1545–1588.CrossRef Google Scholar

Angluin, D., Frazier, M. and Pitt, L. (1992). Learning conjunctions of Horn clauses. Machine Learning 9:147–164. 128CrossRef Google Scholar

Bakir, G., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B. and Vishwanathan, S.V.N. (2007). Predicting Structured Data. MIT Press. 361Google Scholar

Banerji, R.B. (1980). Artificial Intelligence: A Theoretical Approach. Elsevier Science. 127Google Scholar

Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1):1–127. 361CrossRef Google Scholar

Best, M.J. and Chakravarti, N. (1990). Active set algorithms for isotonic regression; a unifying framework. Mathematical Programming 47(1):425–439. 80, 229CrossRef Google Scholar

Blockeel, H. (2010 a). Hypothesis language. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 507–511. Springer. 127Google Scholar

Blockeel, H. (2010 b). Hypothesis space. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 511–513. Springer. 127Google Scholar

Blockeel, H., De Raedt, L. and Ramon, J. (1998). Top-down induction of clustering trees. In J.W., Shavlik (ed.), Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), pp. 55–63. Morgan Kaufmann. 103, 156Google Scholar

Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M.K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM 36(4):929–965. 128CrossRef Google Scholar

Boser, B.E., Guyon, I. and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the International Conference on Computational Learning Theor y (COLT 1992), pp. 144–152. 229Google Scholar

Bouckaert, R. and Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In H., Dai, R., Srikant and C., Zhang (eds.), Advances in Knowledge Discovery and Data Mining, LNCS, volume 3056, pp. 3–12. Springer. 358Google Scholar

Boullé, M. (2004). Khiops: A statistical discretization method of continuous attributes. Machine Learning 55(1):53–69. 328CrossRef Google Scholar

Boullé, M. (2006). MODL: A Bayes optimal discretization method for continuous attributes. Machine Learning 65(1):131–165. 328CrossRef Google Scholar

Bourke, C., Deng, K., Scott, S.D., Schapire, R.E. and Vinodchandran, N.V. (2008). On reoptimizing multi-class classifiers. Machine Learning 71(2-3):219–242. 102CrossRef Google Scholar

Brazdil, P., Giraud-Carrier, C.G., Soares, C. and Vilalta, R. (2009). Metalearning – Applications to Data Mining. Springer. 342Google Scholar

Brazdil, P., Vilalta, R., Giraud-Carrier, C.G. and Soares, C. (2010). Metalearning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 662–666. Springer. 342Google Scholar

Breiman, L. (1996 a). Bagging predictors. Machine Learning 24(2):123–140. 341CrossRef Google Scholar

Breiman, L. (1996 b). Stacked regressions. Machine Learning 24(1):49–64. 342

Breiman, L. (2001). Random forests. Machine Learning 45(1):5–32. 341Google Scholar

Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees. Wadsworth. 156Google Scholar

Brier, G.W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 78(1):1–3. 802.0.CO;2>CrossRef Google Scholar

Brown, G. (2010). Ensemble learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 312–320. Springer. 341Google Scholar

Bruner, J.S., Goodnow, J.J. and Austin, G.A. (1956). A Study of Thinking. Science Editions. 2nd edn 1986. 127Google Scholar

Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press. 361CrossRef Google Scholar

Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In Proceedings of the European Conference on Artificial Intelligence (ECAI 1990), pp. 147–149. 296Google Scholar

Clark, P. and Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Y., Kodratoff (ed.), Proceedings of the European Working Session on Learning (EWSL 1991), LNCS, volume 482, pp. 151–163. Springer. 192Google Scholar

Clark, P. and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning 3:261–283. 192CrossRef Google Scholar

Cohen, W. W. (1995). Fast effective rule induction. In A., Prieditis and S.J., Russell (eds.), Proceedings of the Twelfth International Conference on Machine Learning (ICML 1995), pp. 115–123. Morgan Kaufmann. 192, 341Google Scholar

Cohen, W.W. and Singer, Y. (1999). A simple, fast, and effictive rule learner. In J., Hendler and D., Subramanian (eds.), Proceedings of the Sixteenth National Conference on Ar tificial Intelligence (AAAI 1999), pp. 335–342. AAAI Press / MIT Press. 341Google Scholar

Cohn, D. (2010). Active learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 10–14. Springer. 128Google Scholar

Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning 20(3):273–297. 229CrossRef Google Scholar

Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1):21–27. 260CrossRef Google Scholar

Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge University Press. 229Google Scholar

Dasgupta, S. (2010). Active learning theory. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 14–19. Springer. 128Google Scholar

Davis, J. and Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In W.W., Cohen and A., Moore (eds.), Proceedings of the Twenty-Third International Conference on Machine Learning (ICML 2006), pp. 233–240. ACM Press. 358Google Scholar

De Raedt, L. (1997). Logical settings for concept-learning. Artificial Intelligence 95(1):187–201. 128CrossRef Google Scholar

De Raedt, L. (2008). Logical and Relational Learning. Springer. 193CrossRef Google Scholar

De Raedt, L. (2010). Logic of generality. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 624–631. Springer. 128Google Scholar

De Raedt, L. and Kersting, K. (2010). Statistical relational learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 916–924. Springer. 193Google Scholar

Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) pp. 1–38. 296Google Scholar

Demšar, J. (2008). On the appropriateness of statistical tests in machine learning. In Proceedings of the ICML'08 Workshop on Evaluation Methods for Machine Learning. 359Google Scholar

Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7:1–30. 359Google Scholar

Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7):1895–1923. 358CrossRef Google Scholar PubMed

Dietterich, T.G. and Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2:263–286. 102Google Scholar

Dietterich, T. G., Kearns, M.J. and Mansour, Y. (1996). Applying the weak learning framework to understand and improve c4.5. In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 96–104. 156Google Scholar

Ding, C.H.Q. and He, X. (2004). K-means clustering via principal component analysis. In C.E., Brodley (ed.), Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004). ACM Press. 329Google Scholar

Domingos, P. and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29(2):103–130. 296CrossRef Google Scholar

Donoho, S.K. and Rendell, L.A. (1995). Rerepresenting and restructuring domain theories: A constructive induction approach. Journal of Artificial Intelligence Research 2:411–446. 328Google Scholar

Drummond, C. (2006). Machine learning as an experimental science (revisited). In Proceedings of the AAAI'06 Workshop on Evaluation Methods for Machine Learning. 359Google Scholar

Drummond, C. and Holte, R.C. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In P., Langley (ed.), Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 239–246. Morgan Kaufmann. 156Google Scholar

Egan, J.P. (1975). Signal Detection Theory and ROC Analysis. Academic Press. 80Google Scholar

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters 27(8):861–874. 80, 358CrossRef Google Scholar

Fawcett, T. and Niculescu-Mizil, A. (2007). PAV and the ROC convex hull. Machine Learning 68(1):97–106. 80, 229CrossRef Google Scholar

Fayyad, U.M. and Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 1993), pp. 1022–1029. 328Google Scholar

Ferri, C., Flach, P.A. and Hernández-Orallo, J. (2002). Learning decision trees using the area under the ROC curve. In C., Sammut and A.G., Hoffmann (eds.), Proceedings of the Ni neteenth International Conference on Machine Learning (ICML 2002), pp. 139–146. Morgan Kaufmann. 156Google Scholar

Ferri, C., Flach, P.A. and Hernández-Orallo, J. (2003). Improving the AUC of probabilistic estimation trees. In N., Lavrač, D., Gamberger, L., Todorovski and H., Blockeel (eds.), Proceedings of the European Conference on Machine Learning (ECML 2003), LNCS, volume 2837, pp. 121–132. Springer. 156Google Scholar

Fix, E. and Hodges, J.L. (1951). Discriminatory analysis. Nonparametric discrimination: Consistency properties. Technical report, USAF School of Aviation Medicine, Texas: Randolph Field. Report Number 4, Project Number 21-49-004. 260Google Scholar

Flach, P.A. (1994). Simply Logical – Intelligent Reasoning by Example. Wiley. 193Google Scholar

Flach, P.A. (2003). The geometry of ROC space: Understanding machine learning metrics through ROC isometrics. In T., Fawcett and N., Mishra (eds.), Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 194–201. AAAI Press. 156Google Scholar

Flach, P.A. (2010 a). First-order logic. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 410–415. Springer. 128Google Scholar

Flach, P.A. (2010 b). ROC analysis. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 869–875. Springer. 80Google Scholar

Flach, P.A. and Lachiche, N. (2001). Confirmation-guided discovery of first-order rules with Tertius. Machine Learning 42(1/2):61–95. 193CrossRef Google Scholar

Flach, P.A. and Matsubara, E.T. (2007). A simple lexicographic ranker and probability estimator. In J.N., Kok, J., Koronacki, R.L., de Mántaras, S., Matwin, D., Mladenic and A., Skowron (eds.), Proceedings of the Eighteenth European Conference on Machine Learning (ECML 2007), LNCS, volume 4701, pp. 575–582. Springer. 80, 229Google Scholar

Freund, Y., Iyer, R.D., Schapire, R.E. and Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4:933–969. 341Google Scholar

Freund, Y. and Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1):119–139. 341CrossRef Google Scholar

Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review 13(1):3–54. 192CrossRef Google Scholar

Fürnkranz, J. (2010). Rule learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 875–879. Springer. 192Google Scholar

Fürnkranz, J. and Flach, P.A. (2003). An analysis of rule evaluation metrics. In T., Fawcett and N., Mishra (eds.), Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 202–209. AAAI Press. 79Google Scholar

Fürnkranz, J. and Flach, P.A. (2005). ROC ‘n’ Rule learning – towards a better understanding of covering algorithms. Machine Learning 58(1):39–77. 192CrossRef Google Scholar

Fürnkranz, J., Gamberger, D. and Lavrač, N. (2012). Foundations of Rule Learning. Springer. 192CrossRef Google Scholar

Fürnkranz, J. and Hüllermeier, E. (eds.) (2010). Preference Learning. Springer. 361

Fürnkranz, J. and Widmer, G. (1994). Incremental reduced error pruning. In Proceedings of the Eleventh International Conference on Machine Learning (ICML 1994), pp. 70–77. 192Google Scholar

Gama, J. and Gaber, M.M. (eds.) (2007). Learning from Data Streams: Processing Techniques in Sensor Networks. Springer. 361CrossRef

Ganter, B. and Wille, R. (1999). Formal Concept Analysis: Mathematical Foundations. Springer. 127CrossRef Google Scholar

Garriga, G.C., Kralj, P. and Lavrač, N. (2008). Closed sets for labeled data. Journal of Machine Learning Research 9:559–580. 127Google Scholar

Gärtner, T. (2009). Kernels for Structured Data. World Scientific. 230Google Scholar

Grünwald, P.D. (2007). The Minimum Description Length Principle. MIT Press. 297Google Scholar

Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research 3:1157–1182. 328Google Scholar

Hall, M.A. (1999). Correlation-based feature selection for machine learning. Ph.D. thesis, University of Waikato. 328

Han, J., Cheng, H., Xin, D. and Yan, X. (2007). Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery 15(1):55–86. 193CrossRef Google Scholar

Hand, D.J. and Till, R.J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45(2):171–186. 102CrossRef Google Scholar

Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial Intelligence 36(2):177–221. 128CrossRef Google Scholar

Hernández-Orallo, J., Flach, P.A. and Ferri, C. (2011). Threshold choice methods: The missing link. Available online at http://arxiv.org/abs/1112.2640. 358Google Scholar

Ho, T.K. (1995). Random decision forests. In Proceedings of the International Conference on Document Analysis and Recognition, p. 278. IEEE Computer Society, Los Alamitos, CA, USA. 341Google Scholar

Hoerl, A.E. and Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics pp. 55–67. 228Google Scholar

Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the Twenty-Second Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 50–57. ACM Press. 329Google Scholar

Hunt, E.B., Marin, J. and Stone, P.J. (1966). Experiments in Induction. Academic Press. 127, 156Google Scholar

Jain, A.K., Murty, M.N. and Flynn, P.J. (1999). Data clustering: A review. ACM Computing Surveys 31(3):264–323. 261CrossRef Google Scholar

Japkowicz, N. and Shah, M. (2011). Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press. 357CrossRef Google Scholar

Jebara, T. (2004). Machine Learning: Discriminative and Generative. Springer. 296CrossRef Google Scholar

John, G.H. and Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI 1995), pp. 338–345. Morgan Kaufmann. 295Google Scholar

Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley. 261CrossRef Google Scholar

Kearns, M.J. and Valiant, L.G. (1989). Cryptographic limitations on learning Boolean formulae and finite automata. In D.S., Johnson (ed.), Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing (STOC 1989), pp. 433–444. ACM Press. 341Google Scholar

Kearns, M.J. and Valiant, L.G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM 41(1):67–95. 341CrossRef Google Scholar

Kerber, R. (1992). Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI 1992), pp. 123–128. AAAI Press. 328Google Scholar

Kibler, D.F. and Langley, P. (1988). Machine learning as an experimental science. In Proceedings of the European Working Session on Learning (EWSL 1988), pp. 81–92. 359Google Scholar

King, R.D., Srinivasan, A. and Dehaspe, L. (2001). Warmr: A data mining tool for chemical data. Journal of Computer-Aided Molecular Design 15(2):173–181. 193CrossRef Google Scholar PubMed

Kira, K. and Rendell, L.A. (1992). The feature selection problem: Traditional methods and a new algorithm. In W.R., Swartout (ed.), Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI 1992), pp. 129–134. AAAI Press / MIT Press. 328Google Scholar

Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press. 103Google Scholar

Kohavi, R. and John, G.H. (1997). Wrappers for feature subset selection. Artificial Intelligence 97(1-2):273–324. 328CrossRef Google Scholar

Koren, Y., Bell, R. and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. IEEE Computer 42(8):30–37. 328CrossRef Google Scholar

Kramer, S. (1996). Structural regression trees. In Proceedings of the National Conference on Artificial Intelligence (AAAI 1996), pp. 812–819. 156Google Scholar

Kramer, S., Lavrač, N. and Flach, P.A. (2000). Propositionalization approaches to relational data mining. In S., Džeroski and N., Lavrač (eds.), Relational Data Mining, pp. 262–286. Springer. 328Google Scholar

Krogel, M.A., Rawles, S., Zelezný, F., Flach, P.A., Lavrač, N. and Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In T., Horváth (ed.), Proceedings of the Thir teenth International Conference on Inductive Logic Programming (ILP 2003), LNCS, volume 2835, pp. 197–214. Springer. 328Google Scholar

Kuncheva, L.I. (2004). Combining Pattern Classifiers: Methods and Algorithms. John Wiley and Sons. 341CrossRef Google Scholar

Lachiche, N. (2010). Propositionalization. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 812–817. Springer. 328Google Scholar

Lachiche, N. and Flach, P.A. (2003). Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In T., Fawcett and N., Mishra (eds.), Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 416–423. AAAI Press. 102Google Scholar

Lafferty, J.D., McCallum, A. and Pereira, F.C.N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In C.E., Brodley and A.P., Danyluk (eds.), Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 282–289. Morgan Kaufmann. 296Google Scholar

Langley, P. (1988). Machine learning as an experimental science. Machine Learning 3:5–8. 359CrossRef Google Scholar

Langley, P. (1994). Elements of Machine Learning. Morgan Kaufmann. 156Google Scholar

Langley, P. (2011). The changing science of machine learning. Machine Learning 82(3):275–279. 359CrossRef Google Scholar

Lavrač, N., Kavšek, B., Flach, P.A. and Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5:153–188. 193Google Scholar

Lee, D.D., Seung, H.S. et al. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791. 328Google Scholar PubMed

Leman, D., Feelders, A. and Knobbe, A.J. (2008). Exceptional model mining. In W., Daelemans, B., Goethals and K., Morik (eds.), Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2008), Part II, LNCS, volume 5212, pp. 1–16. Springer. 103Google Scholar

Lewis, D. (1998). Naive Bayes at forty: The independence assumption in information retrieval. In Proceedings of the Tenth European Conference on Machine Learning (ECML 1998), pp. 4–15. Springer. 295Google Scholar

Li, W., Han, J. and Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules. In N., Cercone, T.Y., Lin and X., Wu (eds.), Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), pp. 369–376. IEEE Computer Society. 193Google Scholar

Little, R.J.A. and Rubin, D.B. (1987). Statistical Analysis with Missing Data. Wiley. 296Google Scholar

Liu, B., Hsu, W. and Ma, Y. (1998). Integrating classification and association rule mining. In Proceedings of the Fourth In ternational Conference on Knowledge Discovery and Data Mining (KDD 1998), pp. 80–86. AAAI Press. 193Google Scholar

Lloyd, J.W. (2003). Logic for Learning – Learning Comprehensible Theories from Structured Data. Springer. 193Google Scholar

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2):129–137. 261CrossRef Google Scholar

Mahalanobis, P.C. (1936). On the generalised distance in statistics. Proceedings of the National Institute of Science, India 2(1):49–55. 260Google Scholar

Mahoney, M.W. and Drineas, P. (2009). CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences 106(3):697. 329CrossRef Google Scholar PubMed

McCallum, A. and Nigam, K. (1998). A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, pp. 41–48. 295Google Scholar

Michalski, R.S. (1973). Discovering classification rules using variable-valued logic system VL1. In Proceedings of the Third International Joint Conference on Artificial Intelligence, pp. 162–172. Morgan Kaufmann Publishers. 127Google Scholar

Michalski, R.S. (1975). Synthesis of optimal and quasi-optimal variable-valued logic formulas. In Proceedings of the 1975 International Symposium on Multiple-Valued Logic, pp. 76–87. 192Google Scholar

Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood. 342Google Scholar

Miettinen, P. (2009). Matrix decomposition methods for data mining: Computational complexity and algorithms. Ph.D. thesis, University of Helsinki. 329

Minsky, M. and Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. 228Google Scholar

Mitchell, T.M. (1977). Version spaces: A candidate elimination approach to rule learning. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence, pp. 305–310. Morgan Kaufmann Publishers. 127Google Scholar

Mitchell, T.M. (1997). Machine Learning. McGraw-Hill. 128Google Scholar

Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing 13(3&4):245–286. 193CrossRef Google Scholar

Muggleton, S., De Raedt, L., Poole, D., Bratko, I., Flach, P.A., Inoue, K. and Srinivasan, A. (2012). ILP turns 20 – biography and future challenges. Machine Learning 86(1):3–23. 193CrossRef Google Scholar

Muggleton, S. and Feng, C. (1990). Efficient induction of logic programs. In Proceedings of the International Conference on Algorithmic Learning Theory (ALT 1990), pp. 368–381. 193Google Scholar

Murphy, A.H. and Winkler, R.L. (1984). Probability forecasting in meteorology. Journal of the American Statistical Association pp. 489–500. 80Google Scholar

Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A (General) pp. 370–384. 296Google Scholar

Novikoff, A.B. (1962). On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pp. 615–622. Polytechnic Institute of Brooklyn, New York. 228Google Scholar

Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In Proceedings of the In ternational Conference on Database Theory (ICDT 1999), pp. 398–416. Springer. 127Google Scholar

Peng, Y., Flach, P.A., Soares, C. and Brazdil, P. (2002). Improved dataset characterisation for meta-learning. In S., Lange, K., Satoh and C.H., Smith (eds.), Proceedings of the Fifth International Conference on Discovery Science (DS 2002), LNCS, volume 2534, pp. 141–152. Springer. 342Google Scholar

Pfahringer, B., Bensusan, H. and Giraud-Carrier, C.G. (2000). Meta-learning by land-marking various learning algorithms. In P., Langley (ed.), Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 743–750. Morgan Kaufmann. 342Google Scholar

Platt, J.C. (1998). Using analytic QP and sparseness to speed training of support vector machines. In M.J., Kearns, S.A., Solla and D.A., Cohn (eds.), Advances in Neural Information Processing Systems 11 (NIPS 1998), pp. 557–563. MIT Press. 229Google Scholar

Plotkin, G.D. (1971). Automatic methods of inductive inference. Ph.D. thesis, University of Edinburgh. 127

Provost, F.J. and Domingos, P. (2003). Tree induction for probability-based ranking. Machine Learning 52(3):199–215. 156CrossRef Google Scholar

Provost, F.J. and Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning 42(3):203–231. 79CrossRef Google Scholar

Quinlan, J.R. (1986). Induction of decision trees. Machine Learning 1(1):81–106. 155CrossRef Google Scholar

Quinlan, J.R. (1990). Learning logical definitions from relations. Machine Learning 5:239–266. 193CrossRef Google Scholar

Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann. 156Google Scholar

Ragavan, H. and Rendell, L.A. (1993). Lookahead feature construction for learning hard concepts. In Proceedings of the Tenth International Conference on Machine Learning (ICML 1993), pp. 252–259. Morgan Kaufmann. 328Google Scholar

Rajnarayan, D.G. and Wolpert, D. (2010). Bias-variance trade-offs: Novel applications. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 101–110. Springer. 103Google Scholar

Rissanen, J. (1978). Modeling by shortest data description. Automatica 14(5):465–471. 297CrossRef Google Scholar

Rivest, R.L. (1987). Learning decision lists. Machine Learning 2(3):229–246. 192CrossRef Google Scholar

Robnik-Sikonja, M. and Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53(1-2):23–69. 328CrossRef Google Scholar

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6):386. 228CrossRef Google Scholar

Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20(0):53–65. 261CrossRef Google Scholar

Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986). Learning representations by back-propagating errors. Nature 323(6088):533–536. 229CrossRef Google Scholar

Schapire, R.E. (1990). The strength of weak learnability. Machine Learning 5:197–227. 341CrossRef Google Scholar

Schapire, R.E. (2003). The boosting approach to machine learning: An overview. In Nonlinear Estimation and Classification, pp. 149–172. Springer. 341Google Scholar

Schapire, R.E., Freund, Y., Bartlett, P. and Lee, W.S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics 26(5):1651–1686. 341Google Scholar

Schapire, R.E. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3):297–336. 341CrossRef Google Scholar

Settles, B. (2011). Active Learning. Morgan & Claypool. 361Google Scholar

Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press. 230CrossRef Google Scholar

Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A. and Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the Twenty-Four th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 1297–1304. 155Google Scholar

Silver, D. and Bennett, K. (2008). Guest editor's introduction: special issue on inductive transfer learning. Machine Learning 73(3):215–220. 361CrossRef Google Scholar

Solomonoff, R.J. (1964 a). A formal theory of inductive inference: Part I. Information and Control 7(1):1–22. 297Google Scholar

Solomonoff, R.J. (1964 b). A formal theory of inductive inference: Part II. Information and Control 7(2):224–254. 297Google Scholar

Srinivasan, A. (2007). The Aleph manual, version 4 and above. Available online at www.cs.ox.ac.uk/activities/machlearn/Aleph/. 193Google Scholar

Stevens, S.S. (1946). On the theory of scales of measurement. Science 103(2684):677–680. 327CrossRef Google Scholar

Sutton, R.S. and Barto, A.G. (1998). Reinforcement Learning: An Introduction. MIT Press. 361Google Scholar

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological) pp. 267–288. 228Google Scholar

Todorovski, L. and Dzeroski, S. (2003). Combining classifiers with meta decision trees. Machine Learning 50(3):223–249. 342CrossRef Google Scholar

Tsoumakas, G., Zhang, M.L. and Zhou, Z.H. (2012). Introduction to the special issue on learning from multi-label data. Machine Learning 88(1-2):1–4. 361CrossRef Google Scholar

Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley. 103Google Scholar

Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM 27(11):1134–1142. 128CrossRef Google Scholar

Vapnik, V.N. and Chervonenkis, A.Y. (1971). On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei I Ee Primeneniya 16(2):264–279. 128Google Scholar

Vere, S.A. (1975). Induction of concepts in the predicate calculus. In Proceedings of the Fourth International Joint Conference on Artificial Intelligence, pp. 281–287. 127Google Scholar

von Hippel, P.T. (2005). Mean, median, and skew: Correcting a textbook rule. Journal of Statistics Education 13(2). 327CrossRef Google Scholar

Wallace, C.S. and Boulton, D.M. (1968). An information measure for classification. Computer Journal 11(2):185–194. 297CrossRef Google Scholar

Webb, G.I. (1995). Opus: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research 3:431–465. 192Google Scholar

Webb, G.I., Boughton, J.R. and Wang, Z. (2005). Not so naive Bayes: Aggregating one-dependence estimators. Machine Learning 58(1):5–24. 295CrossRef Google Scholar

Winston, P.H. (1970). Learning structural descriptions from examples. Technical report, MIT Artificial Intelligence Lab. AITR-231. 127Google Scholar

Wojtusiak, J., Michalski, R.S., Kaufman, K.A. and Pietrzykowski, J. (2006). The AQ21 natural induction program for pattern discovery: Initial version and its novel features. In Proceedings of the Eighteenth IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2006), pp. 523–526. 192Google Scholar

Wolpert, D.H. (1992). Stacked generalization. Neural Networks 5(2):241–259. 342CrossRef Google Scholar

Zadrozny, B. and Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the Eighth ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2002), pp. 694–699. ACM Press. 80, 229Google Scholar

Zeugmann, T. (2010). PAC learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 745–753. Springer. 128Google Scholar

Zhou, Z.H. (2012). Ensemble Me thods: Foundations and Algorithms. Taylor & Francis. 341Google Scholar

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.

Book contents

References

Summary

Information

Access options

Book purchase

Temporarily unavailable

References

Accessibility standard: Unknown

Save book to Kindle

Save book to Dropbox

Save book to Google Drive