Hostname: page-component-848d4c4894-wzw2p Total loading time: 0 Render date: 2024-05-25T01:16:20.552Z Has data issue: false hasContentIssue false

Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics

Published online by Cambridge University Press:  28 March 2014

Laboratoire de Physique Théorique de la Matière Condensée CNRS UMR 7600Université Pierre et Marie Curie-Paris 6, 4 place Jussieu, F-75252 Paris Cedex 05, France and Institut des Hautes Études Scientifiques 35 route de Chartres, F-91440, Bures-sur-Yvette, France Email:
Rights & Permissions [Opens in a new window]


Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Statistical entropy was introduced by Shannon as a basic concept in information theory measuring the average missing information in a random source. Extended into an entropy rate, it gives bounds in coding and compression theorems. In this paper, I describe how statistical entropy and entropy rate relate to other notions of entropy that are relevant to probability theory (entropy of a discrete probability distribution measuring its unevenness), computer sciences (algorithmic complexity), the ergodic theory of dynamical systems (Kolmogorov–Sinai or metric entropy) and statistical physics (Boltzmann entropy). Their mathematical foundations and correlates (the entropy concentration, Sanov, Shannon–McMillan–Breiman, Lempel–Ziv and Pesin theorems) clarify their interpretation and offer a rigorous basis for maximum entropy principles. Although often ignored, these mathematical perspectives give a central position to entropy and relative entropy in statistical laws describing generic collective behaviours, and provide insights into the notions of randomness, typicality and disorder. The relevance of entropy beyond the realm of physics, in particular for living systems and ecosystems, is yet to be demonstrated.

Copyright © Cambridge University Press 2014 


Algoet, P. H. and Cover, T. M. (1988) A sandwich proof of the Shannon–McMillan–Breiman theorem. Annals of Probability 16 899909.CrossRefGoogle Scholar
Amari, S. and Nagaoka, H. (2000) Methods of information geometry, Oxford University Press.Google Scholar
Avery, J. (2003) Information theory and evolution, World Scientific.CrossRefGoogle Scholar
Badii, R. and Politi, A. (1997) Complexity. Hierarchical structures and scaling in physics, Cambridge University Press.CrossRefGoogle Scholar
Balding, D., Ferrari, P. A., Fraiman, R. and Sued, M. (2008) Limit theorems for sequences of random trees. TEST, DOI 10.1007/s11749-008-0092-z.Google Scholar
Balian, R. (2004) Entropy, a protean concept. In: Dalibard, J., Duplantier, B. and Rivasseau, V. (eds.) Entropy, Poincaré Seminar 2003, Birkhaüser119144.Google Scholar
Balian, R. (2005) Information in statistical physics. Studies in History and Philosophy of Modern Physics 36 323353.CrossRefGoogle Scholar
Banavar, J. R., Maritan, A. and Volkov, I. (2010) Applications of the principle of maximum entropy: from physics to ecology. Journal of Physics: Condensed Matter 22 063101.Google ScholarPubMed
Blanc, J. L., Pezard, L. and Lesne, A. (2011) Mutual information rate of pair of symbolic sequences.CrossRefGoogle Scholar
Blanc, J. L., Schmidt, N., Bonnier, L., Pezard, L. and Lesne, A. (2008) Quantifying neural correlations using Lempel–Ziv complexity. In: Perrinet, L. U. and Daucé, E. (eds.) Proceedings of the Second french conference on Computational Neuroscience (Neurocomp'08), ISBN 978-2-9532965-0-1, 4043.Google Scholar
Boltzmann, L. (1877) Über die Beziehung zwisschen dem zweiten Haubtsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respektive dem Sätzen über das Wärmegleichgewicht. (‘On the Relation between the Second Law of the Mechanical Theory of Heat and the Probability Calculus with respect to the Propositions about Heat-Equivalence’.) Wiener Berichte 76 373435. (Included in Wissenschaftliche Abhandlungen 2, paper 42 (1909) Barth, Leipzig; reissued in 1969, Chelsea, New York.)Google Scholar
Breiman, L. (1957) The individual ergodic theorem of information theory. Annals of Mathematical Statistics 28 809811. (Correction: (1957) 31 809–810.)CrossRefGoogle Scholar
Bricmont, J. (1995) Science of chaos or chaos in science. Physicalia Magazine 17 159208.Google Scholar
Brillouin, L. (1951a) Maxwell's demon cannot operate: Information and entropy. Journal of Applied Physics 22 334337.CrossRefGoogle Scholar
Brillouin, L. (1951b) Physical entropy and information. Journal of Applied Physics 22 338343.CrossRefGoogle Scholar
Brillouin, L. (1953) Negentropy principle of information. Journal of Applied Physics 24 11521163.CrossRefGoogle Scholar
Brillouin, L. (1956) Science and Information Theory, Academic Press.CrossRefGoogle Scholar
Brin, M. and Katok, A. (1983) On local entropy. In: Palis, J. (ed.) Geometric dynamics. Springer-Verlag Lecture Notes in Mathematics 1007 3038.CrossRefGoogle Scholar
Brudno, A. A. (1983) Entropy and the complexity of the trajectory of a dynamical system. Transactions of the Moscow Mathematical Society 44 127152.Google Scholar
Buten, H. (1989) What to my wondering eyes, Harper Collins.Google Scholar
Callen, H. B. (1985) Thermodynamics and thermostatics, 2nd edition, Wiley.Google Scholar
Castiglione, P., Falcioni, M., Lesne, A. and Vulpiani, A. (2008) Chaos and coarse-graining in statistical mechanics, Cambridge University Press.CrossRefGoogle Scholar
Cercignani, C. (1988) The Boltzmann equation and its applications, Springer-Verlag.CrossRefGoogle Scholar
Cercignani, C. (1998) Ludwig Boltzmann – The man who trusted atoms, Oxford University Press.Google Scholar
Chaitin, G. J. (1966) On the length of programs for computing finite binary sequences. Journal of the ACM 13 547569.CrossRefGoogle Scholar
Chandler, D. (1987) Introduction to modern statistical mechanics, Oxford University Press.Google Scholar
Clausius, R. (1865) The mechanical theory of heat – with its applications to the steam engine and to physical properties of bodies, John van Voorst, London.Google Scholar
Cohen, E. G. D. and Gallavotti, G. (1999) Note on two theorems of nonequilibrium statistical mechanics. Journal of Statistical Physics 96 13431349.CrossRefGoogle Scholar
Cover, T. M. and Thomas, J. A. (2006) Elements of information theory, 2nd edition, Wiley.Google Scholar
Cox, R. T. (1946) Probability, frequency, and reasonable expectation. American Journal of Physics 14 113.CrossRefGoogle Scholar
Csiszár, I. (1975) I-divergence geometry of probability distributions and minimization problems. Annals of Probability 3 146158.CrossRefGoogle Scholar
Csiszár, I. (1998) The Method of types. IEEE Transactions on Information Theory 44 25052523.CrossRefGoogle Scholar
Csiszár, I. and Körner, J. (1981) Information theory, coding theorems for discrete memoryless systems, Akadémiai Kiadoó, Budapest.Google Scholar
de Finetti, B. (1970) Theory of probability – a critical introduction treatment, Wiley.Google Scholar
Dessalles, J. L. (2006). A structural model of intuitive probability. In: Fum, D., Del Missier, F. and Stocco, A. (eds.) Proceedings of the seventh International Conference on Cognitive Modeling, Edizioni Goliardiche, Trieste8691.Google Scholar
Durand, B. and Zvonkine, A. (2007) Kolmogorov complexity. In: Charpentier, E., Lesne, A. and Nikolski, N. (eds.) Kolmogorov's Heritage in Mathematics, Springer-Verlag 281300.CrossRefGoogle Scholar
Einstein, A. (1910) Theorie der Opaleszenz von homogenen Flüssigkeiten und Flüssigkeitsgemischen in der Nähe des kritischen Zustandes. Annalen der Physik (Leipzig) 33 12751298. (English translation: Theory of opalescence of homogeneous liquids and mixtures of liquids in the vicinity of the critical state. In: Alexander, J. (ed.) Colloid Chemistry, Rheinhold, 1913, Volume I, 323–329. Reprinted in: Stachel, J. (1987) (ed.) The Collected Papers of Albert Einstein, Princeton University Press 3 231–249.)CrossRefGoogle Scholar
Ellis, R. S. (1985) Entropy, large deviations and statistical mechanics, Springer-Verlag.CrossRefGoogle Scholar
Evans, D. J. and Searles, D. J. (2002) The fluctuation theorem. Advances in Physics 51 15291585.CrossRefGoogle Scholar
Falcioni, M., Loreto, V. and Vulpiani, A. (2003) Kolmogorov's legacy about entropy, chaos and complexity. In: Vulpiani, A. and Livi, R. (eds.) The Kolmogorov Legacy in Physics, Springer-Verlag 85108.CrossRefGoogle Scholar
Feldman, D. P. (2002) A brief introduction to information theory, excess entropy and computational mechanics. (Available online at Scholar
Feldman, D. P. and Crutchfield, J. P. (1998) Measures of statistical complexity: Why? Physics Letters A 238 244252.CrossRefGoogle Scholar
Ford, K. (2007) From Kolmogorov's theorem on empirical distribution to number theory. In: Charpentier, E., Lesne, A. and Nikolski, N. (eds.) Kolmogorov's heritage in mathematics, Springer-Verlag 97108.CrossRefGoogle Scholar
Frank, S. A. (2009) The common patterns of nature. Journal of Evolutionary Biology 22 15631585.CrossRefGoogle ScholarPubMed
Gallavotti, G. (1998) Chaotic dynamics, fluctuations, nonequilibrium ensembles. Chaos 8 384393.CrossRefGoogle ScholarPubMed
Gallavotti, G. (2006) Entropy, thermostats and the chaotic hypothesis. Chaos 16 043114.CrossRefGoogle ScholarPubMed
Gaspard, P. (2004) Time-reversed dynamical entropy and irreversibility in Markovian random processes. Journal of Statistical Physics 117 599615.CrossRefGoogle Scholar
Gell-Mann, M. and Lloyd, S. (1996) Information measures, effective complexity, and total information. Complexity 2 4452.3.0.CO;2-X>CrossRefGoogle Scholar
Gell-Mann, M. and Lloyd, S. (2003) Effective complexity. In: Gell-Mann, M. and Tsallis, C. (eds.) Nonextensive Entropy – Interdisciplinary Applications, Oxford University Press 387398.Google Scholar
Georgii, H. O. (2003) Probabilistic aspects of entropy. In: Greven, A., Keller, G. and Warnecke, G. (eds.) Entropy, Princeton University Press 3754.CrossRefGoogle Scholar
Gillies, D. (2000) Philosophical theories of probability, Routledge.Google Scholar
Glasner, E. (2003) Ergodic theory via joinings, American Mathematical Society.CrossRefGoogle Scholar
Gorban, A. N. (2007) Order-disorder separation: Geometric revision. Physica A 374 85102.CrossRefGoogle Scholar
Grassberger, P. (1986) Toward a quantitative theory of self-generated complexity. International Journal of Theoretical Physics 25 907938.CrossRefGoogle Scholar
Gray, R. M. (1990) Entropy and information theory, Springer. (Available at Scholar
Gruber, C., Pache, S. and Lesne, A. (2004) On the second law of thermodynamics and the piston problem. Journal of Statistical Physics 117 739772.CrossRefGoogle Scholar
Haegeman, B. and Etienne, R. S. (2010) Entropy maximization and the spatial distribution of species. American Naturalist 175 E74E90.CrossRefGoogle ScholarPubMed
Honerkamp, J. (1998) Statistical physics, Springer-Verlag.CrossRefGoogle Scholar
Ihara, S. (1993) Information theory for continuous systems, World Scientific.CrossRefGoogle Scholar
Jaynes, E. T. (1957a) Information theory and statistical mechanics Part I. Physical Review 106 620630.CrossRefGoogle Scholar
Jaynes, E. T. (1957b) Information theory and statistical mechanics Part II. Physical Review 108 171190.CrossRefGoogle Scholar
Jaynes, E. T. (1973) The well-posed problem. Foundations of Physics 3 477493.CrossRefGoogle Scholar
Jaynes, E. T. (1979) Where do we stand on maximum entropy? In: Levine, R. D. and Tribus, M. (eds.) The Maximum Entropy Formalism, MIT Press 15118.Google Scholar
Jaynes, E. T. (1980) The minimum entropy production principle. Annual Review of Physical Chemistry 31 579601.CrossRefGoogle Scholar
Jaynes, E. T. (1982) On the rationale of maximum entropy methods. Proceedings of the IEEE 70 939952.CrossRefGoogle Scholar
Jaynes, E. T. (1982) Papers on probability, statistics and statistical physics, Reidel.Google Scholar
Kagan, A. M., Linnik, Y. M. and Rao, C. R. (1973) Characterization problems in mathematical statistics, Wiley.Google Scholar
Kantz, H. and Schreiber, T. (1997) Nonlinear time series analysis, Cambridge University Press.Google Scholar
Karlin, S. and Taylor, H. M. (1975) A first course in stochastic processes, Academic Press.Google Scholar
Kay, J. J. (1984) Self-organization in living systems, Ph.D. thesis, Systems Design Engineering, University of Waterloo, Ontario.Google Scholar
Kolmogorov, A. N. (1965) Three approaches to the quantitative definition of information. Problems of Information Transmission 1 17.Google Scholar
Krieger, W. (1970) On entropy and generators of measure-preserving transformations. Transactions of the American Mathematical Society 149 453464.CrossRefGoogle Scholar
Krieger, W. (1972) On unique ergodicity. In: Proceedings Sixth Berkeley Symposium 2, University of California Press 327346.Google Scholar
Kullback, S. and Leibler, R. (1951) On information and sufficiency. Annals of Mathematical Statistics 22 7986.CrossRefGoogle Scholar
Laguës, M. and Lesne, A. (2008) Invariances d'échelle, 2nd edition, Belin, Paris. (English translation (2011) Scaling, Springer-Verlag.)Google Scholar
Landauer, R. (1961) Irreversibility and heat generation in the computing process. IBM Journal of Research and Development 5 183191.CrossRefGoogle Scholar
Lebowitz, J. L. (1993a) Boltzmann's Entropy and Time's Arrow. Physics Today 46 3238.CrossRefGoogle Scholar
Lebowitz, J. L. (1993b) Macroscopic laws, microscopic dynamics, time's arrow and Boltzmann's entropy. Physica A 194 127.CrossRefGoogle Scholar
Ledrappier, F. and Strelcyn, J. M. (1982) A proof of the estimation from below in Pesin's entropy formula. Ergodic Theory and Dynamical Systems 2 203219.CrossRefGoogle Scholar
Lempel, A. and Ziv, J. (1976) On the complexity of finite sequences. IEEE Transactions on Information Theory 22 7581.CrossRefGoogle Scholar
Lesne, A. (1998) Renormalization methods, Wiley.Google Scholar
Lesne, A. (2007) Discrete vs continuous controversy in physics. Mathematical Structures in Computer Science 17 185223.CrossRefGoogle Scholar
Lesne, A. and Benecke, A. (2008) Feature context-dependency and complexity reduction in probability landscapes for integrative genomics. Theoretical Biology and Medical Modelling 5 21.CrossRefGoogle ScholarPubMed
Lesne, A., Blanc, J. L. and Pezard, L. (2009) Entropy estimation of very short symbolic sequences. Physical Review E 79 046208.CrossRefGoogle ScholarPubMed
Leyton, M. (2001) A generative theory of shape, Springer.Google Scholar
Lévy, P. (1965) Processus stochastiques et mouvement brownien, Gauthier-Villars, Paris. (Reprinted by Éditions J. Gabay, Paris.)Google Scholar
Li, M. and Vitanyi, P. (1997) An Introduction to Kolmogorov complexity and its applications, Springer.CrossRefGoogle Scholar
Mahara, H. and Yamaguchi, T. (2010) Entropy balance in distributed reversible Gray-Scott model. Physica D 239 729734.CrossRefGoogle Scholar
Martin-Löf, P. (1966) The definition of random sequence. Information and Control 9 602619.CrossRefGoogle Scholar
McMillan, B. (1953) The basic theorems of information theory. Annals of Mathematical Statistics 24 196219.CrossRefGoogle Scholar
Mugur-Schächter, M. (1980) Le concept de fonctionnelle d'opacité d'une statistique. Étude des relations entre la loi des grands nombres, l'entropie informationnelle et l'entropie statistique. Annales de l'IHP, section A 32 3371.Google Scholar
Nicolis, G. and Gaspard, P. (1994) Toward a probabilistic approach to complex systems. Chaos, Solitons and Fractals 4 4157.CrossRefGoogle Scholar
Nicolis, G. and Prigogine, I. (1977) Self-organization in nonequilibrium systems, Wiley.Google Scholar
Parisi, G. (2003) Complexity and intelligence. In: Vulpiani, A. and Livi, R. (eds.) The Kolmogorov Legacy in Physics, Springer-Verlag 109122.CrossRefGoogle Scholar
Pesin, Y. (1997) Dimension theory in dynamical systems. Contemporary views and applications, University of Chicago Press.CrossRefGoogle Scholar
Phillips, S. J. and Dudík, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31 161175.CrossRefGoogle Scholar
Phillips, S. J., Anderson, R. P. and Schapire, R. E. (2006) Maximum entropy modeling of species geographic distribution. Ecological Modelling 190 231259.CrossRefGoogle Scholar
Prigogine, I. (1967) Thermodynamics of irreversible processes, Interscience Publishers.Google Scholar
Rached, Z., Alajaji, F. and Campbell, L. (2001) Rényi's divergence and entropy rates for finite alphabet Markov sources. IEEE Transactions on Information Theory 47 15531562.CrossRefGoogle Scholar
Robert, C. (1990) An entropy concentration theorem: applications in artificial intelligence and descriptive statistics. Journal of Applied Probability 27 303313.CrossRefGoogle Scholar
Ruelle, D. P. (1978) Thermodynamic formalism, Addison-Wesley.Google Scholar
Ruelle, D. P. (2003) Extending the definition of entropy to nonequilibrium steady states. Proceedings of the National Academy of Sciences of the United States of America 100 30543058.CrossRefGoogle ScholarPubMed
Samengo, I. (2002) Estimating probabilities from experimental frequencies. Physical Review E 65 046124.CrossRefGoogle ScholarPubMed
Sanov, I. N. (1957) On the probability of large deviations of random variables (in Russian), Matematicheskii Sbornik 42 1144. (English translation in: (1961) Selected Translations in Mathematical Statistics and Probability I, Institute of Mathematical Statstics, Providence 213–244.)Google Scholar
Sagawa, T. and Ueda, M. (2009) Minimal energy cost for thermodynamic information processing: measurement and information erasure. Physical Review Letters 102 250602.CrossRefGoogle ScholarPubMed
Schrödinger, E. (1944) What is life? The physical aspect of the living cell, Cambridge University Press.Google Scholar
Schulman, L. S. (2010) We know why coffee cools. Physica E 42 269272.CrossRefGoogle Scholar
Shannon, C. (1948) A mathematical theory of communication. Bell System Technical Journal 27 379423.CrossRefGoogle Scholar
Shinner, J. S., Davison, M. and Landsberg, J. T. (1999) Simple measure for complexity. Physical Review E 59 14591464.CrossRefGoogle Scholar
Sinai, Ya. G. (1959) On the concept of entropy for dynamical systems (in Russian). Doklady Akademii Nauk SSSR 124 768771.Google Scholar
Sokal, A. D. (1997) Monte Carlo methods in statistical mechanics: Foundations and new algorithms. In: De Witt-Morette, C. C. and Folacci, A. (eds.) Functional integration: basics and applications (1996 Cargèse summer school), Plenum Press.Google Scholar
Sokal, A. D. and Thomas, L. E. (1989). Exponential convergence to equilibrium for a class of random-walk models. Journal of Statistical Physics 54 797828.CrossRefGoogle Scholar
Solomonoff, R. (1978). Complexity-based induction systems: comparisons and convergence theorems. IEEE Transactions on Information Theory 24 422432.CrossRefGoogle Scholar
Szilard, L. (1929) Uber die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. (On the lessening of entropy in a thermodynamic system by interference of an intelligent being). Zeitschrift für Physik 53 840856.CrossRefGoogle Scholar
Touchette, H. (2009) The large deviation approach to statistical mechanics. Physics Reports 478 169.CrossRefGoogle Scholar
Tribus, M. and McIrvine, E. C. (1971) Energy and information. Scientific American 225 179188.CrossRefGoogle Scholar
Van Campenhout, J. M. and Cover, T. M. (1981) Maximum entropy and conditional entropy. IEEE Transactions on Information Theory 27 483489.CrossRefGoogle Scholar
Vovk, V. and Shafer, G. (2003) Kolmogorov's contributions to the foundations of probability. Problems of Information Transmission 39 2131.CrossRefGoogle Scholar
Werhl, A. (1978) General properties of entropy. Reviews of Modern Physics 50 221261.Google Scholar
White, H. (1993) Algorithmic complexity of points in dynamical systems. Ergodic Theory and Dynamical Systems 13 807830.CrossRefGoogle Scholar
Wyner, A. D. and Ziv, J. (1989) Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression. IEEE Transactions on Information Theory 35 12501258.CrossRefGoogle Scholar
Ziv, J. and Lempel, A. (1977) A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23 337343.CrossRefGoogle Scholar
Ziv, J. and Lempel, A. (1978) Compression of individual sequences by variable rate coding. IEEE Transactions on Information Theory 24 530536.CrossRefGoogle Scholar
Zuk, O., Kanter, I. and Domany, E. (2005) The entropy of a binary hidden Markov process. Journal of Statistical Physics 121 343360. (Conference version: Aymptotics of the entropy rate for a hidden Markov process. Proceedings DCC'05 173–182.)CrossRefGoogle Scholar
Zurek, W. H. (1984) Maxwell's Demon, Szilard's engine and quantum measurements. In: Moore, G. T. and Scully, M. O. (eds.) Frontiers of nonequilibrium statistical physics, Plenum Press 151161.Google Scholar