Skip to main content Accessibility help
×
Home
Big Data

Big Data

Big Data and methods for analyzing large data sets such as machine learning have in recent times deeply transformed scientific practice in many fields. However, an epistemological study of these novel tools is still largely lacking. After a conceptual analysis of the notion of data and a brief introduction into the methodological dichotomy between inductivism and hypothetico-deductivism, several controversial theses regarding big data approaches are discussed. These include, whether correlation replaces causation, whether the end of theory is in sight and whether big data approaches constitute entirely novel scientific methodology. In this Element, I defend an inductivist view of big data research and argue that the type of induction employed by the most successful big data algorithms is variational induction in the tradition of Mill's methods. Based on this insight, the before-mentioned epistemological issues can be systematically addressed.

  • Export citation
  • Recommend to librarian
  • Buy the Element
  • Copyright

  • COPYRIGHT: © Wolfgang Pietsch 2021

References

Hide all
Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. WIRED Magazine, 16/07, www.wired.com/science/discoveries/magazine/16–07/pb_theory. Google Scholar
Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–231. CrossRef | Google Scholar
Calhoun, C. (2002). Dictionary of the Social Sciences. Oxford: Oxford University Press. Google Scholar
Duhem, P. (1906/1962). The Aim and Structure of Physical Theory. New York: Atheneum. Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA: Massachusetts Institute of Technology Press. Google Scholar
Leonelli, S. (2019). What distinguishes data from models? European Journal for Philosophy of Science 9, 22. Google Scholar
  • PubMed
  • Callebaut, W. (2012). Scientific perspectivism: A philosopher of science’s response to the challenge of big data biology. Studies in History and Philosophy of Biological and Biomedical Science, 43(1), 69–80. CrossRef | Google Scholar
  • PubMed
  • Floridi, L. (2019). Semantic conceptions of information. In E. N. Zalta, ed., The Stanford Encyclopedia of Philosophy (Winter 2019 Edition), plato.stanford.edu/archives/win2019/entries/information-semantic/. Google Scholar
    Hosni, H., & Vulpiani, A. (2018a). Forecasting in light of big data. Philosophy & Technology, 31, 557–69. CrossRef | Google Scholar
    Leonelli, S. (2016). Data-Centric Biology: A Philosophical Study, Chicago: Chicago University Press. CrossRef | Google Scholar
    Feest, U., & Steinle, F. (2016). Experiment. In Hymphreys, P., ed., The Oxford Handbook of Philosophy of Science. Oxford: Oxford University Press, pp. 274–95. Google Scholar
    Burian, R. (1997). Exploratory experimentation and the role of histochemical techniques in the work of Jean Brachet, 1938–1952. History and Philosophy of the Life Sciences, 19, 27–45. Google Scholar
  • PubMed
  • Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–60. Google Scholar
    Kuhlmann, M. (2011). Mechanisms in dynamically complex systems. In Illari, P., Russo, F., & Williamson, J., eds., Causality in the Sciences. Oxford: Oxford University Press. Google Scholar
    Leonelli, S. (2014). What difference does quantity make? On the epistemology of big data in biology. Big Data & Society 1(1). CrossRef | Google Scholar
  • PubMed
  • Lyon, A. (2016). Data. In Humphreys, P., ed., The Oxford Handbook of Philosophy of Science. Oxford: Oxford University Press. Google Scholar
    Floridi, L. (2008). Data. In Darity, W. A., ed., International Encyclopedia of the Social Sciences. Detroit: Macmillan. Google Scholar
    Floridi, L. (2011). The Philosophy of Information. Oxford: Oxford University Press. CrossRef | Google Scholar
    Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: Traps in big data analysis. Science, 343(6167), 1203–5. CrossRef | Google Scholar
  • PubMed
  • Hosni, H., & Vulpiani, A. (2018b). Data science and the art of modelling. Lettera Matematica, 6, 121–9. CrossRef | Google Scholar
    Hempel, C. G. (1966). Philosophy of Natural Science. Upper Saddle River, NJ: Prentice Hall. Google Scholar
    Jelinek, F. (2009). The dawn of statistical ASR and MT. Computational Linguistics, 35(4), 483–94. CrossRef | Google Scholar
    Höfer, T., Przyrembel, H., & Verleger, S. (2004). New evidence for the theory of the stork. Paediatric and Perinatal Epidemiology, 18(1), 88–92. CrossRef | Google Scholar
  • PubMed
  • Bogen, J., & Woodward, J. (1988). Saving the phenomena. The Philosophical Review, 97(3), 303–52. CrossRef | Google Scholar
    Baumgartner, M., & Graßhoff, G. (2003). Kausalität und kausales Schliessen. Bern: Bern Studies in the History and Philosophy of Science. Google Scholar
    Calude, C. S., & Longo, G. (2017). The deluge of spurious correlations in big data. Foundations of Science, 22(3), 595–612. CrossRef | Google Scholar
    Einstein, A. (1934). On the method of theoretical physics. Philosophy of Science, 1(2), 163–9. CrossRef | Google Scholar
    Heisenberg, W. (1931). Kausalgesetz und Quantenmechanik. Erkenntnis, 2, 172–82. CrossRef | Google Scholar
    Cartwright, N. (1979). Causal laws and effective strategies. Noûs, 13(4), 419–37. CrossRef | Google Scholar
    Flach, P. (2012). Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge: Cambridge University Press. CrossRef | Google Scholar
    Foster, I., & Heus, P. (2017). Databases. In Foster, I, Ghani, R, Jarmin, R. S, Kreuter, F, & Lane, J, eds., Big Data and Social Science. Boca Raton, FL: CRC Press, pp. 93–124. Google Scholar
    Frické, M. (2014). Big data and its epistemology. Journal of the Association for Information Science and Technology, 66(4), 651–61. Google Scholar
    Hambling, D. (2019). The Pentagon has a laser that can identify people from a distance – by their heartbeat. MIT Technology Review, www.technologyreview.com/2019/06/27/238884/the-pentagon-has-a-laser-that-can-identify-people-from-a-distanceby-their-heartbeat/. Google Scholar
    Harman, G., & Kulkarni, S. (2007). Reliable Reasoning. Induction and Statistical Learning Theory. Boston: Massachusetts Institute of Technology Press. CrossRef | Google Scholar
    Mach, E. (1905/1976). Knowledge and Error: Sketches on the Psychology of Enquiry. Dordrecht: D. Reidel. CrossRef | Google Scholar
    Bellman, R. E. (1961). Adaptive Control Processes: A Guided Tour. Princeton: Princeton University Press. CrossRef | Google Scholar
    Feynman, R. (1974). Cargo cult science. Engineering and Science, 37(7), 10–13. Google Scholar
    Foster, I., Ghani, R., Jarmin, R. S., Kreuter, F., & Lane, J. (2017). Big Data and Social Science. Boca Raton, FL: CRC Press. Google Scholar
    Kitchin, R. (2014). The Data Revolution. Los Angeles: Sage. Google Scholar
    Baumgartner, M., & Falk, C. (2019). Boolean difference-making: A modern regularity theory of causation. The British Journal for the Philosophy of Science, https://doi.org/10.1093/bjps/axz047. CrossRef | Google Scholar
    Hume, D. (1748). An Enquiry Concerning Human Understanding. London: A. Millar. Google Scholar
    Adriaans, P. (2019). Information. In E. N. Zalta, ed., The Stanford Encyclopedia of Philosophy (Spring 2019 Edition), plato.stanford.edu/archives/spr2019/entries/information/. Google Scholar
    boyd, , d., & Crawford, K. (2012). Critical questions for big data. Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662–79. CrossRef | Google Scholar
    Clark, A. (1996). Philosophical Foundations. In Boden, M. A., ed., Artificial Intelligence. San Diego, CA: Academic Press, pp. 1–22. Google Scholar
    Coveney, P. V., Dougherty, E. R., & Highfield, R. R. (2016). Big data needs big theory too. Philosophical Transactions of the Royal Society A, 374, 20160153. CrossRef | Google Scholar
    Gillies, D. (1996). Artificial Intelligence and Scientific Method. Oxford: Oxford University Press. Google Scholar
    Hacking, I. (1992). The self-vindication of the laboratory sciences. In Pickering, A., ed., Science as Practice and Culture. Chicago: Chicago University Press, pp. 29–64. Google Scholar
    Luca, M., & Bazerman, M. H. (2020). Power of Experiments: Decision Making in a Data-Driven World. Cambridge, MA: Massachusetts Institute of Technology Press. Google Scholar
    Bergadano, F. (1993). Machine learning and the foundations of inductive inference. Minds and Machines, 3, 31–51. CrossRef | Google Scholar
    Bird, A. (2010). Eliminative abduction: Examples from medicine. Studies in History and Philosophy of Science Part A, 41(4), 345–52. CrossRef | Google Scholar
    Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge: Cambridge University Press. CrossRef | Google Scholar
    LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature 521, 436–44. CrossRef | Google Scholar
  • PubMed
  • Bacon, F. (1620/1994). Novum Organum. Chicago: Open Court. Google Scholar
    Colman, A. M. (2015). Oxford Dictionary of Psychology. Oxford: Oxford University Press. Google Scholar
    Lavoisier, A. (1789/1890). Elements of Chemistry. Edinburgh: William Creech. Google Scholar
    Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542, 115–18. CrossRef | Google Scholar
  • PubMed
  • Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. New York: Springer. CrossRef | Google Scholar
    Knüsel, B., Zumwald, M., Baumberger, C., Hirsch Hadorn, G., Fischer, E., Bresch, D., & Knutti, R. (2019). Applying big data beyond small problems in climate research. Nature Climate Change, 9, 196–202. CrossRef | Google Scholar
    Cartwright, N. (1983). How the Laws of Physics Lie. Oxford: Oxford University Press. CrossRef | Google Scholar
    Ampère, J.-M. (1826/2012). Mathematical Theory of Electro-Dynamic Phenomena Uniquely Derived from Experiments, transl. M. D. Godfrey. Paris: A. Hermann, archive.org/details/AmpereTheorieEn. Google Scholar
    Ghani, R., & Schierholz, M. (2017). Machine learning. In Foster, I, Ghani, R, Jarmin, R. S, Kreuter, F, & Lane, J, eds., Big Data and Social Science. Boca Raton, FL: CRC Press, pp. 147–86. Google Scholar
    Graßhoff, G., & May, M. (2001). Causal regularities. In Spohn, W., Ledwig, M., & Esfeld, M., eds., Current Issues in Causation. Paderborn: Mentis Verlag, pp. 85–114. Google Scholar
    Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. Google Scholar
    Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety. Research Report. blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf Google Scholar
    Mayer-Schönberger, V., & Cukier, K. (2013). Big Data. London: John Murray. Google Scholar
    Panza, M., Napoletani, D., & Struppa, D. (2011). Agnostic science. Towards a philosophy of data analysis. Foundations of Science, 16(1), 1–20. Google Scholar
    Pietsch, W. (2016a). The causal nature of modeling with big data. Philosophy & Technology, 29(2), 137–71. CrossRef | Google Scholar
    Sullivan, E. (2019). Understanding from machine learning models. The British Journal for the Philosophy of Science, axz035, https://doi.org/10.1093/bjps/axz035. CrossRef | Google Scholar
    Solomonoff, R. (1999). Two kinds of probabilistic induction. The Computer Journal, 42(4), 256–9. Google Scholar
    Wan, C., Wang, L., & Phoha, V. (2019). A survey on gait recognition. ACM Computing Surveys, 51(5), 89. CrossRef | Google Scholar
    Mazzocchi, F. (2015). Could big data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO Reports, 16(10), 1250–5. CrossRef | Google Scholar
    Norton, J. D. (1995). Eliminative induction as a method of discovery: Einstein’s discovery of General Relativity. In Leplin, J., ed., The Creation of Ideas in Physics: Studies for a Methodology of Theory Construction. Dordrecht: Kluwer Academic Publishers, pp. 29–69. Google Scholar
    Ng, A., & Soo, K. (2017). Numsense! Data Science for the Layman. Seattle, WA: Amazon. Google Scholar
    Pietsch, W. (2016b). A difference-making account of causation, philsci-archive.pitt.edu/11913/. Google Scholar
    Popper, K. (1935/2002). The Logic of Scientific Discovery. London: Routledge Classics. Google Scholar
    Solomonoff, R. (2008). Three kinds of probabilistic induction: Universal distributions and convergence theorems. The Computer Journal, 51(5), 566–70. CrossRef | Google Scholar
    Rheinberger, H.-J. (2011). Infra-experimentality: From traces to data, from data to patterning facts. History of Science, 49(3), 337–48. CrossRef | Google Scholar
    Russo, F. (2007). The rationale of variation in methodological and evidential pluralism. Philosophica, 77, 97–124. Google Scholar
    Steinle, F. (1997). Entering new fields: Exploratory uses of experimentation. Philosophy of Science 64, S65–S74. CrossRef | Google Scholar
    Yu, K.-H., Zhang, C., Berry, G. J., Altman, R. B., Ré, C., Rubin, D. L., & Snyder, M. 2016. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature Communications, 7, 12474. CrossRef | Google Scholar
  • PubMed
  • Mach, E. (1923/1986). Principles of the Theory of Heat – Historically and Critically Elucidated, transl. T. J. McCormack. Dordrecht: D. Reidel. CrossRef | Google Scholar
    Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Fleet, D, Pajdla, T, Schiele, B, & Tuytelaars, T, eds., Computer Vision – ECCV 2014. New York, NY: Springer, pp. 818–33. Google Scholar
    von Wright, G. H. (1951). A Treatise on Induction and Probability. New York: Routledge. Google Scholar
    Norton, J. D. (2005). A little survey of induction. In Achinstein, P., ed., Scientific Evidence: Philosophical Theories and Applications. Baltimore: Johns Hopkins University Press, pp. 9–34. Google Scholar
    Napoletani, D., Panza, M., & Struppa, D. C. (2011). Toward a philosophy of data analysis. Foundations of Science, 16(1), 1–20. CrossRef | Google Scholar
    Schurz, G. (2014). Philosophy of Science: A Unified Approach, New York, NY: Routledge. Google Scholar
    Williamson, J. (2009). The philosophy of science and its relation to machine learning. In Gaber, M. M., ed., Scientific Data Mining and Knowledge Discovery: Principles and Foundations. Berlin: Springer, pp. 77–89. CrossRef | Google Scholar
    Scientific Knowledge and the Deep Past: History Matters Currie, Adrian CrossRef | Google Scholar
    Relativism in the Philosophy of Science Kusch, Martin Google Scholar
    Pietsch, W. (2014). The structure of causal evidence based on eliminative induction. Topoi, 33(2), 421–35. CrossRef | Google Scholar
    Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5), 988–99. CrossRef | Google Scholar
  • PubMed
  • Pietsch, W. (2017). Causation, probability, and all that: Data science as a novel inductive paradigm. In Dehmer, M & Emmert-Streib, F, eds., Frontiers in Data Science. Boca Raton, FL: CRC Press, pp. 329–53. Google Scholar
    Norton, J. D. (2007). Causation as folk science. Philosophers’ Imprint, 3, 4. Google Scholar
    Northcott, R. (2019). Big data and prediction: Four case studies. Studies in History and Philosophy of Science A. doi:10.1016/j.shpsa.2019.09.002 Google Scholar
    Russell, S., & Norvig, P. (2009). Artificial Intelligence. Upper Saddle River, NJ: Pearson. Google Scholar
    Wheeler, G. (2016). Machine epistemology and big data. In McIntyre, L. & Rosenberg, A., eds., The Routledge Companion to Philosophy of Social Science. London: Routledge. Google Scholar
    Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Washington, DC: Spartan Books. Google Scholar
    Russo, F. (2009). Causality and Causal Modelling in the Social Sciences. Measuring Variations, New York: Springer. CrossRef | Google Scholar
    Vickers, J. (2018). The problem of induction. In E. N. Zalta, ed., The Stanford Encyclopedia of Philosophy (Spring 2018 Edition), plato.stanford.edu/archives/spr2018/entries/induction-problem/. Google Scholar
    Vo, H., & Silva, C. (2017). Programming with Big Data. In Foster, I, Ghani, R, Jarmin, R. S, Kreuter, F, & Lane, J, eds., Big Data and Social Science. Boca Raton, FL: CRC Press, pp. 125–44. Google Scholar
    Pearson, K. (1911). The Grammar of Science, 3rd ed., Black. Google Scholar
    Russell, B. (1913). On the notion of cause. Proceedings of the Aristotelian Society, 13, 1–26. Google Scholar
    Mackie, J. L. (1967). Mill’s methods of induction. In Edward, P., ed., The Encyclopedia of Philosophy, Vol. 5. New York: MacMillan, pp. 324–32. Google Scholar
    Minsky, M. L., & Papert, S. A. (1969). Perceptrons. An Introduction to Computational Geometry. Cambridge: Massachusetts Institute of Technology Press. Google Scholar
    Mill, J. S. (1886). System of Logic. London: Longmans, Green & Co. Google Scholar
    Pietsch, W. (2015). Aspects of theory-ladenness in data-intensive science. Philosophy of Science 82(5): 905–16. Google Scholar
    Pietsch, W. (2019). A causal approach to analogy. Journal for General Philosophy of Science, 50(4), 489–520. CrossRef | Google Scholar
    Scholl, R. (2013). Causal inference, mechanisms, and the Semmelweis case. Studies in History and Philosophy of Science Part A, 44(1), 66–76. CrossRef | Google Scholar
    Solomonoff, R. (1964a). A formal theory of inductive inference, part I. Information and Control, 7(1), 1–22. CrossRef | Google Scholar
    Sterkenburg, T. F. (2016). Solomonoff prediction and Occam’s Razor. Philosophy of Science 83(4), 459–79. CrossRef | Google Scholar
    Woodward, J. (2011). Data and phenomena: A restatement and a defense. Synthese, 182, 165–79. CrossRef | Google Scholar
    Norvig, P. (2009). Natural language corpus data. In Segaran, T & Hammerbacher, J, eds., Beautiful Data. Sebastopol, CA: O’Reilly, pp. 219–42. Google Scholar
    Philosophy of Probability and Statistical Modelling Suárez, Mauricio Google Scholar
    Unity of Science Tahko, Tuomas E. Google Scholar
    Plantin, J. C., & Russo, F. (2016). D’abord les données, ensuite la méthode? Big data et déterminisme en sciences sociales. Socio, 6, 97–115. CrossRef | Google Scholar
    Solomonoff, R. (1964b). A formal theory of inductive inference, part II. Information and Control, 7(2), 224–54. Google Scholar
    Vapnik, V. N. (2000). The Nature of Statistical Learning Theory, 2nd ed., New York: Springer. CrossRef | Google Scholar
    Big Data Pietsch, Wolfgang Google Scholar
    Mackie, J. L. (1980). The Cement of the Universe. Oxford: Clarendon Press. CrossRef | Google Scholar
    Ratti, E. (2015). Big data biology: Between eliminative inferences and exploratory experiments. Philosophy of Science, 82(2), 198–218. CrossRef | Google Scholar
    Williamson, J. (2004). A dynamic interaction between machine learning and the philosophy of science. Minds and Machines, 14(4), 539–49. CrossRef | Google Scholar

    Metrics

    Altmetric attention score

    Full text views

    Total number of HTML views: 0
    Total number of PDF views: 0 *
    Loading metrics...

    Abstract views

    Total abstract views: 0 *
    Loading metrics...

    * Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

    Usage data cannot currently be displayed.