Skip to main content Accessibility help
Hostname: page-component-797576ffbb-xg4rj Total loading time: 0 Render date: 2023-12-08T10:59:34.647Z Has data issue: false Feature Flags: { "corePageComponentGetUserInfoFromSharedSession": true, "coreDisableEcommerce": false, "useRatesEcommerce": true } hasContentIssue false

Big Data

Published online by Cambridge University Press:  29 January 2021

Wolfgang Pietsch
Technical University of Munich


Big Data and methods for analyzing large data sets such as machine learning have in recent times deeply transformed scientific practice in many fields. However, an epistemological study of these novel tools is still largely lacking. After a conceptual analysis of the notion of data and a brief introduction into the methodological dichotomy between inductivism and hypothetico-deductivism, several controversial theses regarding big data approaches are discussed. These include, whether correlation replaces causation, whether the end of theory is in sight and whether big data approaches constitute entirely novel scientific methodology. In this Element, I defend an inductivist view of big data research and argue that the type of induction employed by the most successful big data algorithms is variational induction in the tradition of Mill's methods. Based on this insight, the before-mentioned epistemological issues can be systematically addressed.
Get access
Online ISBN: 9781108588676
Publisher: Cambridge University Press
Print publication: 18 February 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Adriaans, P. (2019). Information. In E. N. Zalta, ed., The Stanford Encyclopedia of Philosophy (Spring 2019 Edition), Scholar
Ampère, J.-M. (1826/2012). Mathematical Theory of Electro-Dynamic Phenomena Uniquely Derived from Experiments, transl. M. D. Godfrey. Paris: A. Hermann, Scholar
Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. WIRED Magazine, 16/07,–07/pb_theory.Google Scholar
Bacon, F. (1620/1994). Novum Organum. Chicago: Open Court.Google Scholar
Baumgartner, M., & Falk, C. (2019). Boolean difference-making: A modern regularity theory of causation. The British Journal for the Philosophy of Science, Scholar
Baumgartner, M., & Graßhoff, G. (2003). Kausalität und kausales Schliessen. Bern: Bern Studies in the History and Philosophy of Science.Google Scholar
Bellman, R. E. (1961). Adaptive Control Processes: A Guided Tour. Princeton: Princeton University Press.CrossRefGoogle Scholar
Bergadano, F. (1993). Machine learning and the foundations of inductive inference. Minds and Machines, 3, 3151.CrossRefGoogle Scholar
Bird, A. (2010). Eliminative abduction: Examples from medicine. Studies in History and Philosophy of Science Part A, 41(4), 345–52.CrossRefGoogle Scholar
Bogen, J., & Woodward, J. (1988). Saving the phenomena. The Philosophical Review, 97(3), 303–52.CrossRefGoogle Scholar
boyd, , d., & Crawford, K. (2012). Critical questions for big data. Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662–79.CrossRefGoogle Scholar
Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199231.CrossRefGoogle Scholar
Burian, R. (1997). Exploratory experimentation and the role of histochemical techniques in the work of Jean Brachet, 1938–1952. History and Philosophy of the Life Sciences, 19, 2745.Google ScholarPubMed
Calhoun, C. (2002). Dictionary of the Social Sciences. Oxford: Oxford University Press.Google Scholar
Callebaut, W. (2012). Scientific perspectivism: A philosopher of science’s response to the challenge of big data biology. Studies in History and Philosophy of Biological and Biomedical Science, 43(1), 6980.CrossRefGoogle ScholarPubMed
Calude, C. S., & Longo, G. (2017). The deluge of spurious correlations in big data. Foundations of Science, 22(3), 595612.CrossRefGoogle Scholar
Cartwright, N. (1979). Causal laws and effective strategies. Noûs, 13(4), 419–37.CrossRefGoogle Scholar
Cartwright, N. (1983). How the Laws of Physics Lie. Oxford: Oxford University Press.CrossRefGoogle Scholar
Clark, A. (1996). Philosophical Foundations. In Boden, M. A., ed., Artificial Intelligence. San Diego, CA: Academic Press, pp. 122.Google Scholar
Colman, A. M. (2015). Oxford Dictionary of Psychology. Oxford: Oxford University Press.Google Scholar
Coveney, P. V., Dougherty, E. R., & Highfield, R. R. (2016). Big data needs big theory too. Philosophical Transactions of the Royal Society A, 374, 20160153.CrossRefGoogle Scholar
Duhem, P. (1906/1962). The Aim and Structure of Physical Theory. New York: Atheneum.Google Scholar
Einstein, A. (1934). On the method of theoretical physics. Philosophy of Science, 1(2), 163–9.CrossRefGoogle Scholar
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542, 115–18.CrossRefGoogle ScholarPubMed
Feest, U., & Steinle, F. (2016). Experiment. In Hymphreys, P., ed., The Oxford Handbook of Philosophy of Science. Oxford: Oxford University Press, pp. 274–95.Google Scholar
Feynman, R. (1974). Cargo cult science. Engineering and Science, 37(7), 1013.Google Scholar
Flach, P. (2012). Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Floridi, L. (2008). Data. In Darity, W. A., ed., International Encyclopedia of the Social Sciences. Detroit: Macmillan.Google Scholar
Floridi, L. (2011). The Philosophy of Information. Oxford: Oxford University Press.CrossRefGoogle Scholar
Floridi, L. (2019). Semantic conceptions of information. In E. N. Zalta, ed., The Stanford Encyclopedia of Philosophy (Winter 2019 Edition), Scholar
Foster, I., Ghani, R., Jarmin, R. S., Kreuter, F., & Lane, J. (2017). Big Data and Social Science. Boca Raton, FL: CRC Press.Google Scholar
Foster, I., & Heus, P. (2017). Databases. In Foster, I, Ghani, R, Jarmin, R. S, Kreuter, F, & Lane, J, eds., Big Data and Social Science. Boca Raton, FL: CRC Press, pp. 93124.Google Scholar
Frické, M. (2014). Big data and its epistemology. Journal of the Association for Information Science and Technology, 66(4), 651–61.Google Scholar
Ghani, R., & Schierholz, M. (2017). Machine learning. In Foster, I, Ghani, R, Jarmin, R. S, Kreuter, F, & Lane, J, eds., Big Data and Social Science. Boca Raton, FL: CRC Press, pp. 147–86.Google Scholar
Gillies, D. (1996). Artificial Intelligence and Scientific Method. Oxford: Oxford University Press.Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA: Massachusetts Institute of Technology Press.Google Scholar
Graßhoff, G., & May, M. (2001). Causal regularities. In Spohn, W., Ledwig, M., & Esfeld, M., eds., Current Issues in Causation. Paderborn: Mentis Verlag, pp. 85114.Google Scholar
Hacking, I. (1992). The self-vindication of the laboratory sciences. In Pickering, A., ed., Science as Practice and Culture. Chicago: Chicago University Press, pp. 2964.Google Scholar
Hambling, D. (2019). The Pentagon has a laser that can identify people from a distance – by their heartbeat. MIT Technology Review, Scholar
Harman, G., & Kulkarni, S. (2007). Reliable Reasoning. Induction and Statistical Learning Theory. Boston: Massachusetts Institute of Technology Press.CrossRefGoogle Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. New York: Springer.CrossRefGoogle Scholar
Heisenberg, W. (1931). Kausalgesetz und Quantenmechanik. Erkenntnis, 2, 172–82.CrossRefGoogle Scholar
Hempel, C. G. (1966). Philosophy of Natural Science. Upper Saddle River, NJ: Prentice Hall.Google Scholar
Höfer, T., Przyrembel, H., & Verleger, S. (2004). New evidence for the theory of the stork. Paediatric and Perinatal Epidemiology, 18(1), 8892.CrossRefGoogle ScholarPubMed
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–60.Google Scholar
Hosni, H., & Vulpiani, A. (2018a). Forecasting in light of big data. Philosophy & Technology, 31, 557–69.CrossRefGoogle Scholar
Hosni, H., & Vulpiani, A. (2018b). Data science and the art of modelling. Lettera Matematica, 6, 121–9.CrossRefGoogle Scholar
Hume, D. (1748). An Enquiry Concerning Human Understanding. London: A. Millar.Google Scholar
Jelinek, F. (2009). The dawn of statistical ASR and MT. Computational Linguistics, 35(4), 483–94.CrossRefGoogle Scholar
Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan.Google Scholar
Kitchin, R. (2014). The Data Revolution. Los Angeles: Sage.Google Scholar
Knüsel, B., Zumwald, M., Baumberger, C., Hirsch Hadorn, G., Fischer, E., Bresch, D., & Knutti, R. (2019). Applying big data beyond small problems in climate research. Nature Climate Change, 9, 196202.CrossRefGoogle Scholar
Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Kuhlmann, M. (2011). Mechanisms in dynamically complex systems. In Illari, P., Russo, F., & Williamson, J., eds., Causality in the Sciences. Oxford: Oxford University Press.Google Scholar
Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety. Research Report. Scholar
Lavoisier, A. (1789/1890). Elements of Chemistry. Edinburgh: William Creech.Google Scholar
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: Traps in big data analysis. Science, 343(6167), 1203–5.CrossRefGoogle ScholarPubMed
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature 521, 436–44.CrossRefGoogle ScholarPubMed
Leonelli, S. (2014). What difference does quantity make? On the epistemology of big data in biology. Big Data & Society 1(1).CrossRefGoogle ScholarPubMed
Leonelli, S. (2016). Data-Centric Biology: A Philosophical Study, Chicago: Chicago University Press.CrossRefGoogle Scholar
Leonelli, S. (2019). What distinguishes data from models? European Journal for Philosophy of Science 9, 22.Google ScholarPubMed
Luca, M., & Bazerman, M. H. (2020). Power of Experiments: Decision Making in a Data-Driven World. Cambridge, MA: Massachusetts Institute of Technology Press.Google Scholar
Lyon, A. (2016). Data. In Humphreys, P., ed., The Oxford Handbook of Philosophy of Science. Oxford: Oxford University Press.Google Scholar
Mach, E. (1905/1976). Knowledge and Error: Sketches on the Psychology of Enquiry. Dordrecht: D. Reidel.CrossRefGoogle Scholar
Mach, E. (1923/1986). Principles of the Theory of Heat – Historically and Critically Elucidated, transl. T. J. McCormack. Dordrecht: D. Reidel.CrossRefGoogle Scholar
Mackie, J. L. (1967). Mill’s methods of induction. In Edward, P., ed., The Encyclopedia of Philosophy, Vol. 5. New York: MacMillan, pp. 324–32.Google Scholar
Mackie, J. L. (1980). The Cement of the Universe. Oxford: Clarendon Press.CrossRefGoogle Scholar
Mayer-Schönberger, V., & Cukier, K. (2013). Big Data. London: John Murray.Google Scholar
Mazzocchi, F. (2015). Could big data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO Reports, 16(10), 1250–5.CrossRefGoogle Scholar
Mill, J. S. (1886). System of Logic. London: Longmans, Green & Co.Google Scholar
Minsky, M. L., & Papert, S. A. (1969). Perceptrons. An Introduction to Computational Geometry. Cambridge: Massachusetts Institute of Technology Press.Google Scholar
Napoletani, D., Panza, M., & Struppa, D. C. (2011). Toward a philosophy of data analysis. Foundations of Science, 16(1), 120.CrossRefGoogle Scholar
Ng, A., & Soo, K. (2017). Numsense! Data Science for the Layman. Seattle, WA: Amazon.Google Scholar
Northcott, R. (2019). Big data and prediction: Four case studies. Studies in History and Philosophy of Science A. doi:10.1016/j.shpsa.2019.09.002Google Scholar
Norton, J. D. (1995). Eliminative induction as a method of discovery: Einstein’s discovery of General Relativity. In Leplin, J., ed., The Creation of Ideas in Physics: Studies for a Methodology of Theory Construction. Dordrecht: Kluwer Academic Publishers, pp. 2969.Google Scholar
Norton, J. D. (2005). A little survey of induction. In Achinstein, P., ed., Scientific Evidence: Philosophical Theories and Applications. Baltimore: Johns Hopkins University Press, pp. 934.Google Scholar
Norton, J. D. (2007). Causation as folk science. Philosophers’ Imprint, 3, 4.Google Scholar
Norvig, P. (2009). Natural language corpus data. In Segaran, T & Hammerbacher, J, eds., Beautiful Data. Sebastopol, CA: O’Reilly, pp. 219–42.Google Scholar
Panza, M., Napoletani, D., & Struppa, D. (2011). Agnostic science. Towards a philosophy of data analysis. Foundations of Science, 16(1), 120.Google Scholar
Pearson, K. (1911). The Grammar of Science, 3rd ed., Black.Google Scholar
Pietsch, W. (2014). The structure of causal evidence based on eliminative induction. Topoi, 33(2), 421–35.CrossRefGoogle Scholar
Pietsch, W. (2015). Aspects of theory-ladenness in data-intensive science. Philosophy of Science 82(5): 905–16.Google Scholar
Pietsch, W. (2016a). The causal nature of modeling with big data. Philosophy & Technology, 29(2), 137–71.CrossRefGoogle Scholar
Pietsch, W. (2016b). A difference-making account of causation, Scholar
Pietsch, W. (2017). Causation, probability, and all that: Data science as a novel inductive paradigm. In Dehmer, M & Emmert-Streib, F, eds., Frontiers in Data Science. Boca Raton, FL: CRC Press, pp. 329–53.Google Scholar
Pietsch, W. (2019). A causal approach to analogy. Journal for General Philosophy of Science, 50(4), 489520.CrossRefGoogle Scholar
Plantin, J. C., & Russo, F. (2016). D’abord les données, ensuite la méthode? Big data et déterminisme en sciences sociales. Socio, 6, 97115.CrossRefGoogle Scholar
Popper, K. (1935/2002). The Logic of Scientific Discovery. London: Routledge Classics.Google Scholar
Ratti, E. (2015). Big data biology: Between eliminative inferences and exploratory experiments. Philosophy of Science, 82(2), 198218.CrossRefGoogle Scholar
Rheinberger, H.-J. (2011). Infra-experimentality: From traces to data, from data to patterning facts. History of Science, 49(3), 337–48.CrossRefGoogle Scholar
Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Washington, DC: Spartan Books.Google Scholar
Russell, B. (1913). On the notion of cause. Proceedings of the Aristotelian Society, 13, 126.Google Scholar
Russell, S., & Norvig, P. (2009). Artificial Intelligence. Upper Saddle River, NJ: Pearson.Google Scholar
Russo, F. (2007). The rationale of variation in methodological and evidential pluralism. Philosophica, 77, 97124.Google Scholar
Russo, F. (2009). Causality and Causal Modelling in the Social Sciences. Measuring Variations, New York: Springer.CrossRefGoogle Scholar
Scholl, R. (2013). Causal inference, mechanisms, and the Semmelweis case. Studies in History and Philosophy of Science Part A, 44(1), 6676.CrossRefGoogle Scholar
Schurz, G. (2014). Philosophy of Science: A Unified Approach, New York, NY: Routledge.Google Scholar
Solomonoff, R. (1964a). A formal theory of inductive inference, part I. Information and Control, 7(1), 122.CrossRefGoogle Scholar
Solomonoff, R. (1964b). A formal theory of inductive inference, part II. Information and Control, 7(2), 224–54.Google Scholar
Solomonoff, R. (1999). Two kinds of probabilistic induction. The Computer Journal, 42(4), 256–9.Google Scholar
Solomonoff, R. (2008). Three kinds of probabilistic induction: Universal distributions and convergence theorems. The Computer Journal, 51(5), 566–70.CrossRefGoogle Scholar
Steinle, F. (1997). Entering new fields: Exploratory uses of experimentation. Philosophy of Science 64, S65S74.CrossRefGoogle Scholar
Sterkenburg, T. F. (2016). Solomonoff prediction and Occam’s Razor. Philosophy of Science 83(4), 459–79.CrossRefGoogle Scholar
Sullivan, E. (2019). Understanding from machine learning models. The British Journal for the Philosophy of Science, axz035, Scholar
Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5), 988–99.CrossRefGoogle ScholarPubMed
Vapnik, V. N. (2000). The Nature of Statistical Learning Theory, 2nd ed., New York: Springer.CrossRefGoogle Scholar
Vickers, J. (2018). The problem of induction. In E. N. Zalta, ed., The Stanford Encyclopedia of Philosophy (Spring 2018 Edition), Scholar
Vo, H., & Silva, C. (2017). Programming with Big Data. In Foster, I, Ghani, R, Jarmin, R. S, Kreuter, F, & Lane, J, eds., Big Data and Social Science. Boca Raton, FL: CRC Press, pp. 125–44.Google Scholar
Wan, C., Wang, L., & Phoha, V. (2019). A survey on gait recognition. ACM Computing Surveys, 51(5), 89.CrossRefGoogle Scholar
Wheeler, G. (2016). Machine epistemology and big data. In McIntyre, L. & Rosenberg, A., eds., The Routledge Companion to Philosophy of Social Science. London: Routledge.Google Scholar
Williamson, J. (2004). A dynamic interaction between machine learning and the philosophy of science. Minds and Machines, 14(4), 539–49.CrossRefGoogle Scholar
Williamson, J. (2009). The philosophy of science and its relation to machine learning. In Gaber, M. M., ed., Scientific Data Mining and Knowledge Discovery: Principles and Foundations. Berlin: Springer, pp. 7789.CrossRefGoogle Scholar
Woodward, J. (2011). Data and phenomena: A restatement and a defense. Synthese, 182, 165–79.CrossRefGoogle Scholar
von Wright, G. H. (1951). A Treatise on Induction and Probability. New York: Routledge.Google Scholar
Yu, K.-H., Zhang, C., Berry, G. J., Altman, R. B., , C., Rubin, D. L., & Snyder, M. 2016. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature Communications, 7, 12474.CrossRefGoogle ScholarPubMed
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Fleet, D, Pajdla, T, Schiele, B, & Tuytelaars, T, eds., Computer Vision – ECCV 2014. New York, NY: Springer, pp. 818–33.Google Scholar

Save element to Kindle

To save this element to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Big Data
Available formats

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Big Data
Available formats

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Big Data
Available formats