Skip to main content Accessibility help

Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics


Statistical entropy was introduced by Shannon as a basic concept in information theory measuring the average missing information in a random source. Extended into an entropy rate, it gives bounds in coding and compression theorems. In this paper, I describe how statistical entropy and entropy rate relate to other notions of entropy that are relevant to probability theory (entropy of a discrete probability distribution measuring its unevenness), computer sciences (algorithmic complexity), the ergodic theory of dynamical systems (Kolmogorov–Sinai or metric entropy) and statistical physics (Boltzmann entropy). Their mathematical foundations and correlates (the entropy concentration, Sanov, Shannon–McMillan–Breiman, Lempel–Ziv and Pesin theorems) clarify their interpretation and offer a rigorous basis for maximum entropy principles. Although often ignored, these mathematical perspectives give a central position to entropy and relative entropy in statistical laws describing generic collective behaviours, and provide insights into the notions of randomness, typicality and disorder. The relevance of entropy beyond the realm of physics, in particular for living systems and ecosystems, is yet to be demonstrated.

    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics
      Available formats
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics
      Available formats
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics
      Available formats
Hide All
Algoet, P. H. and Cover, T. M. (1988) A sandwich proof of the Shannon–McMillan–Breiman theorem. Annals of Probability 16 899909.
Amari, S. and Nagaoka, H. (2000) Methods of information geometry, Oxford University Press.
Avery, J. (2003) Information theory and evolution, World Scientific.
Badii, R. and Politi, A. (1997) Complexity. Hierarchical structures and scaling in physics, Cambridge University Press.
Balding, D., Ferrari, P. A., Fraiman, R. and Sued, M. (2008) Limit theorems for sequences of random trees. TEST, DOI 10.1007/s11749-008-0092-z.
Balian, R. (2004) Entropy, a protean concept. In: Dalibard, J., Duplantier, B. and Rivasseau, V. (eds.) Entropy, Poincaré Seminar 2003, Birkhaüser119144.
Balian, R. (2005) Information in statistical physics. Studies in History and Philosophy of Modern Physics 36 323353.
Banavar, J. R., Maritan, A. and Volkov, I. (2010) Applications of the principle of maximum entropy: from physics to ecology. Journal of Physics: Condensed Matter 22 063101.
Blanc, J. L., Pezard, L. and Lesne, A. (2011) Mutual information rate of pair of symbolic sequences.
Blanc, J. L., Schmidt, N., Bonnier, L., Pezard, L. and Lesne, A. (2008) Quantifying neural correlations using Lempel–Ziv complexity. In: Perrinet, L. U. and Daucé, E. (eds.) Proceedings of the Second french conference on Computational Neuroscience (Neurocomp'08), ISBN 978-2-9532965-0-1, 4043.
Boltzmann, L. (1877) Über die Beziehung zwisschen dem zweiten Haubtsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respektive dem Sätzen über das Wärmegleichgewicht. (‘On the Relation between the Second Law of the Mechanical Theory of Heat and the Probability Calculus with respect to the Propositions about Heat-Equivalence’.) Wiener Berichte 76 373435. (Included in Wissenschaftliche Abhandlungen 2, paper 42 (1909) Barth, Leipzig; reissued in 1969, Chelsea, New York.)
Breiman, L. (1957) The individual ergodic theorem of information theory. Annals of Mathematical Statistics 28 809811. (Correction: (1957) 31 809–810.)
Bricmont, J. (1995) Science of chaos or chaos in science. Physicalia Magazine 17 159208.
Brillouin, L. (1951a) Maxwell's demon cannot operate: Information and entropy. Journal of Applied Physics 22 334337.
Brillouin, L. (1951b) Physical entropy and information. Journal of Applied Physics 22 338343.
Brillouin, L. (1953) Negentropy principle of information. Journal of Applied Physics 24 11521163.
Brillouin, L. (1956) Science and Information Theory, Academic Press.
Brin, M. and Katok, A. (1983) On local entropy. In: Palis, J. (ed.) Geometric dynamics. Springer-Verlag Lecture Notes in Mathematics 1007 3038.
Brudno, A. A. (1983) Entropy and the complexity of the trajectory of a dynamical system. Transactions of the Moscow Mathematical Society 44 127152.
Buten, H. (1989) What to my wondering eyes, Harper Collins.
Callen, H. B. (1985) Thermodynamics and thermostatics, 2nd edition, Wiley.
Castiglione, P., Falcioni, M., Lesne, A. and Vulpiani, A. (2008) Chaos and coarse-graining in statistical mechanics, Cambridge University Press.
Cercignani, C. (1988) The Boltzmann equation and its applications, Springer-Verlag.
Cercignani, C. (1998) Ludwig Boltzmann – The man who trusted atoms, Oxford University Press.
Chaitin, G. J. (1966) On the length of programs for computing finite binary sequences. Journal of the ACM 13 547569.
Chandler, D. (1987) Introduction to modern statistical mechanics, Oxford University Press.
Clausius, R. (1865) The mechanical theory of heat – with its applications to the steam engine and to physical properties of bodies, John van Voorst, London.
Cohen, E. G. D. and Gallavotti, G. (1999) Note on two theorems of nonequilibrium statistical mechanics. Journal of Statistical Physics 96 13431349.
Cover, T. M. and Thomas, J. A. (2006) Elements of information theory, 2nd edition, Wiley.
Cox, R. T. (1946) Probability, frequency, and reasonable expectation. American Journal of Physics 14 113.
Csiszár, I. (1975) I-divergence geometry of probability distributions and minimization problems. Annals of Probability 3 146158.
Csiszár, I. (1998) The Method of types. IEEE Transactions on Information Theory 44 25052523.
Csiszár, I. and Körner, J. (1981) Information theory, coding theorems for discrete memoryless systems, Akadémiai Kiadoó, Budapest.
de Finetti, B. (1970) Theory of probability – a critical introduction treatment, Wiley.
Dessalles, J. L. (2006). A structural model of intuitive probability. In: Fum, D., Del Missier, F. and Stocco, A. (eds.) Proceedings of the seventh International Conference on Cognitive Modeling, Edizioni Goliardiche, Trieste8691.
Durand, B. and Zvonkine, A. (2007) Kolmogorov complexity. In: Charpentier, E., Lesne, A. and Nikolski, N. (eds.) Kolmogorov's Heritage in Mathematics, Springer-Verlag 281300.
Einstein, A. (1910) Theorie der Opaleszenz von homogenen Flüssigkeiten und Flüssigkeitsgemischen in der Nähe des kritischen Zustandes. Annalen der Physik (Leipzig) 33 12751298. (English translation: Theory of opalescence of homogeneous liquids and mixtures of liquids in the vicinity of the critical state. In: Alexander, J. (ed.) Colloid Chemistry, Rheinhold, 1913, Volume I, 323–329. Reprinted in: Stachel, J. (1987) (ed.) The Collected Papers of Albert Einstein, Princeton University Press 3 231–249.)
Ellis, R. S. (1985) Entropy, large deviations and statistical mechanics, Springer-Verlag.
Evans, D. J. and Searles, D. J. (2002) The fluctuation theorem. Advances in Physics 51 15291585.
Falcioni, M., Loreto, V. and Vulpiani, A. (2003) Kolmogorov's legacy about entropy, chaos and complexity. In: Vulpiani, A. and Livi, R. (eds.) The Kolmogorov Legacy in Physics, Springer-Verlag 85108.
Feldman, D. P. (2002) A brief introduction to information theory, excess entropy and computational mechanics. (Available online at
Feldman, D. P. and Crutchfield, J. P. (1998) Measures of statistical complexity: Why? Physics Letters A 238 244252.
Ford, K. (2007) From Kolmogorov's theorem on empirical distribution to number theory. In: Charpentier, E., Lesne, A. and Nikolski, N. (eds.) Kolmogorov's heritage in mathematics, Springer-Verlag 97108.
Frank, S. A. (2009) The common patterns of nature. Journal of Evolutionary Biology 22 15631585.
Gallavotti, G. (1998) Chaotic dynamics, fluctuations, nonequilibrium ensembles. Chaos 8 384393.
Gallavotti, G. (2006) Entropy, thermostats and the chaotic hypothesis. Chaos 16 043114.
Gaspard, P. (2004) Time-reversed dynamical entropy and irreversibility in Markovian random processes. Journal of Statistical Physics 117 599615.
Gell-Mann, M. and Lloyd, S. (1996) Information measures, effective complexity, and total information. Complexity 2 4452.
Gell-Mann, M. and Lloyd, S. (2003) Effective complexity. In: Gell-Mann, M. and Tsallis, C. (eds.) Nonextensive Entropy – Interdisciplinary Applications, Oxford University Press 387398.
Georgii, H. O. (2003) Probabilistic aspects of entropy. In: Greven, A., Keller, G. and Warnecke, G. (eds.) Entropy, Princeton University Press 3754.
Gillies, D. (2000) Philosophical theories of probability, Routledge.
Glasner, E. (2003) Ergodic theory via joinings, American Mathematical Society.
Gorban, A. N. (2007) Order-disorder separation: Geometric revision. Physica A 374 85102.
Grassberger, P. (1986) Toward a quantitative theory of self-generated complexity. International Journal of Theoretical Physics 25 907938.
Gray, R. M. (1990) Entropy and information theory, Springer. (Available at
Gruber, C., Pache, S. and Lesne, A. (2004) On the second law of thermodynamics and the piston problem. Journal of Statistical Physics 117 739772.
Haegeman, B. and Etienne, R. S. (2010) Entropy maximization and the spatial distribution of species. American Naturalist 175 E74E90.
Honerkamp, J. (1998) Statistical physics, Springer-Verlag.
Ihara, S. (1993) Information theory for continuous systems, World Scientific.
Jaynes, E. T. (1957a) Information theory and statistical mechanics Part I. Physical Review 106 620630.
Jaynes, E. T. (1957b) Information theory and statistical mechanics Part II. Physical Review 108 171190.
Jaynes, E. T. (1973) The well-posed problem. Foundations of Physics 3 477493.
Jaynes, E. T. (1979) Where do we stand on maximum entropy? In: Levine, R. D. and Tribus, M. (eds.) The Maximum Entropy Formalism, MIT Press 15118.
Jaynes, E. T. (1980) The minimum entropy production principle. Annual Review of Physical Chemistry 31 579601.
Jaynes, E. T. (1982) On the rationale of maximum entropy methods. Proceedings of the IEEE 70 939952.
Jaynes, E. T. (1982) Papers on probability, statistics and statistical physics, Reidel.
Kagan, A. M., Linnik, Y. M. and Rao, C. R. (1973) Characterization problems in mathematical statistics, Wiley.
Kantz, H. and Schreiber, T. (1997) Nonlinear time series analysis, Cambridge University Press.
Karlin, S. and Taylor, H. M. (1975) A first course in stochastic processes, Academic Press.
Kay, J. J. (1984) Self-organization in living systems, Ph.D. thesis, Systems Design Engineering, University of Waterloo, Ontario.
Kolmogorov, A. N. (1965) Three approaches to the quantitative definition of information. Problems of Information Transmission 1 17.
Krieger, W. (1970) On entropy and generators of measure-preserving transformations. Transactions of the American Mathematical Society 149 453464.
Krieger, W. (1972) On unique ergodicity. In: Proceedings Sixth Berkeley Symposium 2, University of California Press 327346.
Kullback, S. and Leibler, R. (1951) On information and sufficiency. Annals of Mathematical Statistics 22 7986.
Laguës, M. and Lesne, A. (2008) Invariances d'échelle, 2nd edition, Belin, Paris. (English translation (2011) Scaling, Springer-Verlag.)
Landauer, R. (1961) Irreversibility and heat generation in the computing process. IBM Journal of Research and Development 5 183191.
Lebowitz, J. L. (1993a) Boltzmann's Entropy and Time's Arrow. Physics Today 46 3238.
Lebowitz, J. L. (1993b) Macroscopic laws, microscopic dynamics, time's arrow and Boltzmann's entropy. Physica A 194 127.
Ledrappier, F. and Strelcyn, J. M. (1982) A proof of the estimation from below in Pesin's entropy formula. Ergodic Theory and Dynamical Systems 2 203219.
Lempel, A. and Ziv, J. (1976) On the complexity of finite sequences. IEEE Transactions on Information Theory 22 7581.
Lesne, A. (1998) Renormalization methods, Wiley.
Lesne, A. (2007) Discrete vs continuous controversy in physics. Mathematical Structures in Computer Science 17 185223.
Lesne, A. and Benecke, A. (2008) Feature context-dependency and complexity reduction in probability landscapes for integrative genomics. Theoretical Biology and Medical Modelling 5 21.
Lesne, A., Blanc, J. L. and Pezard, L. (2009) Entropy estimation of very short symbolic sequences. Physical Review E 79 046208.
Leyton, M. (2001) A generative theory of shape, Springer.
Lévy, P. (1965) Processus stochastiques et mouvement brownien, Gauthier-Villars, Paris. (Reprinted by Éditions J. Gabay, Paris.)
Li, M. and Vitanyi, P. (1997) An Introduction to Kolmogorov complexity and its applications, Springer.
Mahara, H. and Yamaguchi, T. (2010) Entropy balance in distributed reversible Gray-Scott model. Physica D 239 729734.
Martin-Löf, P. (1966) The definition of random sequence. Information and Control 9 602619.
McMillan, B. (1953) The basic theorems of information theory. Annals of Mathematical Statistics 24 196219.
Mugur-Schächter, M. (1980) Le concept de fonctionnelle d'opacité d'une statistique. Étude des relations entre la loi des grands nombres, l'entropie informationnelle et l'entropie statistique. Annales de l'IHP, section A 32 3371.
Nicolis, G. and Gaspard, P. (1994) Toward a probabilistic approach to complex systems. Chaos, Solitons and Fractals 4 4157.
Nicolis, G. and Prigogine, I. (1977) Self-organization in nonequilibrium systems, Wiley.
Parisi, G. (2003) Complexity and intelligence. In: Vulpiani, A. and Livi, R. (eds.) The Kolmogorov Legacy in Physics, Springer-Verlag 109122.
Pesin, Y. (1997) Dimension theory in dynamical systems. Contemporary views and applications, University of Chicago Press.
Phillips, S. J. and Dudík, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31 161175.
Phillips, S. J., Anderson, R. P. and Schapire, R. E. (2006) Maximum entropy modeling of species geographic distribution. Ecological Modelling 190 231259.
Prigogine, I. (1967) Thermodynamics of irreversible processes, Interscience Publishers.
Rached, Z., Alajaji, F. and Campbell, L. (2001) Rényi's divergence and entropy rates for finite alphabet Markov sources. IEEE Transactions on Information Theory 47 15531562.
Robert, C. (1990) An entropy concentration theorem: applications in artificial intelligence and descriptive statistics. Journal of Applied Probability 27 303313.
Ruelle, D. P. (1978) Thermodynamic formalism, Addison-Wesley.
Ruelle, D. P. (2003) Extending the definition of entropy to nonequilibrium steady states. Proceedings of the National Academy of Sciences of the United States of America 100 30543058.
Samengo, I. (2002) Estimating probabilities from experimental frequencies. Physical Review E 65 046124.
Sanov, I. N. (1957) On the probability of large deviations of random variables (in Russian), Matematicheskii Sbornik 42 1144. (English translation in: (1961) Selected Translations in Mathematical Statistics and Probability I, Institute of Mathematical Statstics, Providence 213–244.)
Sagawa, T. and Ueda, M. (2009) Minimal energy cost for thermodynamic information processing: measurement and information erasure. Physical Review Letters 102 250602.
Schrödinger, E. (1944) What is life? The physical aspect of the living cell, Cambridge University Press.
Schulman, L. S. (2010) We know why coffee cools. Physica E 42 269272.
Shannon, C. (1948) A mathematical theory of communication. Bell System Technical Journal 27 379423.
Shinner, J. S., Davison, M. and Landsberg, J. T. (1999) Simple measure for complexity. Physical Review E 59 14591464.
Sinai, Ya. G. (1959) On the concept of entropy for dynamical systems (in Russian). Doklady Akademii Nauk SSSR 124 768771.
Sokal, A. D. (1997) Monte Carlo methods in statistical mechanics: Foundations and new algorithms. In: De Witt-Morette, C. C. and Folacci, A. (eds.) Functional integration: basics and applications (1996 Cargèse summer school), Plenum Press.
Sokal, A. D. and Thomas, L. E. (1989). Exponential convergence to equilibrium for a class of random-walk models. Journal of Statistical Physics 54 797828.
Solomonoff, R. (1978). Complexity-based induction systems: comparisons and convergence theorems. IEEE Transactions on Information Theory 24 422432.
Szilard, L. (1929) Uber die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. (On the lessening of entropy in a thermodynamic system by interference of an intelligent being). Zeitschrift für Physik 53 840856.
Touchette, H. (2009) The large deviation approach to statistical mechanics. Physics Reports 478 169.
Tribus, M. and McIrvine, E. C. (1971) Energy and information. Scientific American 225 179188.
Van Campenhout, J. M. and Cover, T. M. (1981) Maximum entropy and conditional entropy. IEEE Transactions on Information Theory 27 483489.
Vovk, V. and Shafer, G. (2003) Kolmogorov's contributions to the foundations of probability. Problems of Information Transmission 39 2131.
Werhl, A. (1978) General properties of entropy. Reviews of Modern Physics 50 221261.
White, H. (1993) Algorithmic complexity of points in dynamical systems. Ergodic Theory and Dynamical Systems 13 807830.
Wyner, A. D. and Ziv, J. (1989) Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression. IEEE Transactions on Information Theory 35 12501258.
Ziv, J. and Lempel, A. (1977) A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23 337343.
Ziv, J. and Lempel, A. (1978) Compression of individual sequences by variable rate coding. IEEE Transactions on Information Theory 24 530536.
Zuk, O., Kanter, I. and Domany, E. (2005) The entropy of a binary hidden Markov process. Journal of Statistical Physics 121 343360. (Conference version: Aymptotics of the entropy rate for a hidden Markov process. Proceedings DCC'05 173–182.)
Zurek, W. H. (1984) Maxwell's Demon, Szilard's engine and quantum measurements. In: Moore, G. T. and Scully, M. O. (eds.) Frontiers of nonequilibrium statistical physics, Plenum Press 151161.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Mathematical Structures in Computer Science
  • ISSN: 0960-1295
  • EISSN: 1469-8072
  • URL: /core/journals/mathematical-structures-in-computer-science
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed