Hostname: page-component-7dc689bd49-6c8t5 Total loading time: 0 Render date: 2023-03-20T13:23:41.048Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Understanding Deep Learning with Statistical Relevance

Published online by Cambridge University Press:  31 January 2022

Tim Räz*
University of Bern, Institute of Philosophy, Bern, Switzerland


This paper argues that a notion of statistical explanation, based on Salmon’s statistical relevance model, can help us better understand deep neural networks. It is proved that homogeneous partitions, the core notion of Salmon’s model, are equivalent to minimal sufficient statistics, an important notion from statistical inference. This establishes a link to deep neural networks via the so-called Information Bottleneck method, an information-theoretic framework, according to which deep neural networks implicitly solve an optimization problem that generalizes minimal sufficient statistics. The resulting notion of statistical explanation is general, mathematical, and subcausal.

© The Author(s), 2022. Published by Cambridge University Press on behalf of the Philosophy of Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Achille, Alessandro, and Soatto, Stefano. 2018. “Information dropout: Learning optimal representations through noisy computation.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).Google Scholar
Alemi, Alexander A., Fischer, Ian, Dillon, Joshua V., and Murphy, Kevin. 2017. “Deep variational information bottleneck.” arXiv:1612.00410v5.Google Scholar
Baumberger, Christoph, Beisbart, Claus, and Brun, Georg. 2017. “What is understanding? An overview of recent debates in epistemology and philosophy of science.” In Explaining Understanding: New Perspectives from Epistemolgy and Philosophy of Science, edited by Stephen Grimm Christoph Baumberger and Sabine Ammon, 134. New York: Routledge.Google Scholar
Casella, George, and Berger, Roger L.. 2002. Statistical Inference. 2nd ed. Duxbury.Google Scholar
Cover, Thomas M., and Thomas, Joy A.. 2006. Elements of Information Theory. 2nd ed. Hoboken, NJ: Wiley.Google Scholar
Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron. 2016. Deep Learning. Cambridge, MA: MIT Press.Google Scholar
Greeno, James G. 1970. “Evaluation of statistical hypotheses using information transmitted.” Philosophy of Science 37 (2):279–94.CrossRefGoogle Scholar
Hastie, Trevor, Tibshirani, Roberto, and Friedman, Jerome. 2009. The Elements of Statistical Learning. 2nd ed. Springer Series in Statistics. Springer.CrossRefGoogle Scholar
Kitcher, Philip. 1989. “Explanatory unification and the causal structure of the world.” In Scientific Explanation, Volume XIII of Minnesota Studies in the Philosophy of Science, edited by Philip Kitcher and Wesley C. Salmon, 410–505. Minneapolis: University of Minnesota Press.Google Scholar
Kitcher, Kitcher, and Salmon, Wesley C., eds. 1989. Scientific Explanation, Volume XIII of Minnesota Studies in the Philosophy of Science. Minneapolis: University of Minnesota Press.Google Scholar
Krishnan, Maya. 2016. “Against interpretability: a critical examination of the interpretability problem in machine learning.” Philosophy & Technology 33:487502.CrossRefGoogle Scholar
Lange, Marc. 2016. Because Without Cause: Non-Causal Explanations in Science and Mathematics. Oxford: Oxford University Press.CrossRefGoogle Scholar
LeCun, Yann, Bengio, Yoshua, and Hinton, Geoffrey. 2015. “Deep learning.” Nature 521:436–44.CrossRefGoogle Scholar
Lehmann, E. L., and Casella, George 1998. Theory of Point Estimation. 2nd ed. Springer Texts in Statistics. New York, Berlin, Heidelberg: Springer.Google Scholar
Lipton, Zachary C. 2016. “The mythos of model interpretability.” arXiv:1606.03490.Google Scholar
Mancosu, Paolo. 2018. “Explanation in mathematics.” In The Stanford Encyclopedia of Philosophy, edited by E. N. Zalta. Metaphysics Research Lab, Stanford University.Google Scholar
Nielsen, Michael A. 2015. Neural Networks and Deep Learning. Determination Press.Google Scholar
Pedregosa, Fabian, Varoquaux, Gaël, Gramfort, Alexandre, Michel, Vincent, Thirion, Bertrand, Grisel, Olivier, Blondel, Mathieu, et al. 2011. “Scikit-learn: Machine learning in Python.” Journal of Machine Learning Research 12:2825–30.Google Scholar
Pincock, Christopher. 2015. “Abstract explanations in science.” British Journal for the Philosophy of Science 66 (4):857–82.Google Scholar
Räz, Tim. 2017. “The Volterra principle generalized.” Philosophy of Science 84 (4):737–60.CrossRefGoogle Scholar
Räz, Tim. 2018. “Euler’s Königsberg: the explanatory power of mathematics.” European Journal for Philosophy of Science 8:331–46.CrossRefGoogle Scholar
Reutlinger, Alexander, and Saatsi, Juha, eds. 2018. Explanation Beyond Causation: Philosophical Perspectives on Non-Causal Explanations. Oxford: Oxford University Press.CrossRefGoogle Scholar
Salmon, W. C. 1971a. “Statistical Explanation.” In Statistical Explanation and Statistical Relevance, edited by Wesley C. Salmon, 29–87. Pittsburgh: Pittsburgh University Press.CrossRefGoogle Scholar
Salmon, Wesley C., ed. 1971b. Statistical Explanation and Statistical Relevance. Pittsburgh: Pittsburgh University Press.CrossRefGoogle Scholar
Salmon, Wesley C. 1984. Scientific Explanation and the Causal Structure of the World. Princeton: Princeton University Press.Google Scholar
Saxe, Andrew M., Bansal, Yamini, Dapello, Joel, Advani, Madhu, Kolchinsky, Artemy, Tracey, Brendan D., and Cox., David D. 2018. On the information bottleneck theory of deep learning. ICLR.Google Scholar
Schwartz-Ziv, Ravid, and Tishby, Naftali. 2017. “Opening the black box of deep neural networks via information.” arXiv:1703.00810.Google Scholar
Shamir, Ohad, Sabato, Sivan, and Tishby, Naftali. 2011. “Learning and generalization with the information bottleneck.” Theoretical Computer Science 411:26962711.CrossRefGoogle Scholar
Spirtes, Peter, Glymour, Clark, and Scheines, Richard. 2000. Causation, Prediction and Search. Cambridge, MA:MIT Press.Google Scholar
Tishby, Naftali, Pereira, Fernando C., and Bialek, William. 1999. “The information bottleneck method.” In Proc. of the 37th Allerton Conference on Communication, Control and Computing, Allerton House, Monticello, Illinois, September 22-24, 1999.Google Scholar
Vidal, René, Bruna, Joan, Giryes, Raja, and Soatto, Stefano. 2017. “Mathematics of deep learning.” arXiv:1712.04741.Google Scholar
Woodward, James. 1987. “On an information-theoretic model of explanation.” Philosophy of Science 54 (1):2144.CrossRefGoogle Scholar
Woodward, James. 2019. “Scientific explanation.” In The Stanford Encyclopedia of Philosophy, edited by E. N. Zalta. Metaphysics Research Lab, Stanford University.Google Scholar
Zhang, Chiyuan, Bengio, Samy, Hardt, Moritz, Recht, Benjamin, and Vinyals, Oriol. 2017. “Understanding deep learning requires rethinking generalization.” arXiv:1611.03530.Google Scholar