Available energy is the main object at stake in the struggle for existence and the evolution of the world.
Gain in entropy always means loss of information, and nothing more.
The perennial question – what is life? The simple answer is that life, either considered in the totality of all its incredible diversity or even in the context of an individual organism, is a highly complex chemical system with a capacity for self-reproduction. But what fuels this system, and what drives the evolution of such extreme apparent complexity? The principle underlying the answer to the first question was initially propounded by Ludwig Boltzmann, the nineteenth-century physicist and natural philosopher. Boltzmann had a tremendous admiration for Darwin and suggested, ‘Available energy is the main object at stake in the struggle for existence and the evolution of the world’.
Thirty-six years later, Alfred Lotka (Reference Lotka1922a, Reference Lotka1922b) interpreted Boltzmann’s view to imply that available energy could be the central concept that unified physics and biology as a quantitative physical principle of evolution, stating, ‘In accord with this observation is the principle that, in the struggle for existence, the advantage must go to those organisms whose energy-capturing devices are most efficient in directing available energy into channels favorable to the preservation of the species’. But, by themselves, such statements left unanswered the question of how different life forms could successfully reproduce themselves and also change. The question became, if available energy is the fuel, what is the directing force behind evolution? Although Boltzmann framed his comments in the context of Darwinian evolution (Box 1.1), his thoughts preceded the publication of Mendel’s work on the genetics of peas. Consequently in the 1940s it fell to another physicist, Erwin Schrödinger, to combine Boltzmann’s concept of available energy with that of a requirement for a heritable informational ‘code-script’ that specifies the form and function of all biological organisms. Life as we know it thus rests on the twin pillars of energy and information – consideration of both is essential for an adequate appreciation of the essence of life.
Two of the most overused, and possibly abused, terms in the evolutionary lexicon. Although there exist notable exceptions (for example, Koonin, Reference Koonin2009; Koonin & Wolf, 2009), as with Lamarckism the precise meaning of the terminology can depend on the interpretation of an individual protagonist and may result in semantic quibbling. In this book the word is used in the strict contexts of variation and the process of natural selection as originally proposed by Darwin and Wallace. It is not, and cannot be, restricted to purely genetically driven evolution. Lamarckism is here defined as describing the inheritance of acquired characteristics, including somatic mutations. Because both Darwin and Lamarck predated the discovery of DNA, these definitions of Darwinism and Lamarckism here assume a more modern perspective.
Today virtually all living organisms depend on deoxyribonucleic acid, DNA, as their primary source of genetic information – their ‘code-script’. Some viruses utilise ribonucleic acid, RNA, a related nucleic acid, but these are very much the exception, and the amount of information encoded in RNA genomes, relative to that in DNA, is minute. So how did DNA achieve this dominant position? Why DNA – and not RNA? The current paradigm of information transfer in living organisms posits that the genetic information encoding proteins in DNA is first copied into RNA, and the RNA copies are then translated into functional proteins by an elaborate molecular machinery (Box 1.2). But it was probably not always thus. In the initial stages of the evolution of life, it is believed that there was no DNA – it was an ‘RNA world’ – and possibly the appearance of RNA even preceded or was concomitant with the evolution of proteins themselves.
The process of the transfer of genetic information in a biological system was initially formulated by Francis Crick in 1956 and subsequently published in Reference Crick1958 (for an illuminating historical perspective, see Cobb, Reference Cobb2017): The Central Dogma. This states that once ‘information’ has passed into protein, it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.
Although commonly interpreted as implying a unidirectional transfer of information from DNA to RNA to protein both Crick’s text and the accompanying diagram in his notebook clearly envisaged, albeit cautiously, that in principle information could be transferred not only from DNA to RNA but from RNA to DNA.
This caution was rewarded by the subsequent discovery that the RNA genome of certain viruses, the aptly named retroviruses, could be copied as DNA and inserted into nuclear DNA.
Both DNA and RNA are nucleic acids. They are both polymers consisting of a backbone of a long strand of alternating sugar and phosphate residues (Box 1.3).
The biological nucleic acids, DNA and RNA, are both polymers in which the monomeric units are nucleotides. Each nucleotide contains a 5-carbon sugar, usually ribose in RNA or deoxyribose in DNA, as well as a phosphate group and a heterocyclic nitrogenous base. The four bases most commonly found in RNA are adenine, cytosine, guanine and uracil, while in DNA thymine is found in place of uracil. Although the base-pairing rules – adenine pairs with uracil or thymine and guanine pairs with cytosine – are specific and conserved, modified variants of the bases often occur in both DNA and RNA. These include 5-methylcytosine, N6-methyl adenine, uracil, hydroxymethyluracil and glucosylated hydroxymethyluracil in double-stranded DNA as well as N6-methyladenine and other variants in transfer RNA.
An organic base is attached to each of the sugars. In RNA there are four principal types – adenine, cytosine, guanine and uracil. In DNA thymine replaces uracil. In any given chain of RNA or DNA the specific order of these bases in the polymer constitutes the genetic information. As famously pointed out by Watson and Crick, particular combinations of these bases have the ability to form complementary pairs with each other by forming hydrogen bonds. Adenine can pair with uracil or thymine, while guanine can pair with cytosine (Figure 3.1). This property allows two nucleic acid strands to base-pair with each other provided that the base sequence of one strand is complementary to that of its neighbour. It is the formation of this double helix that provides the fundamental basis for genetic inheritance. Today DNA exists almost exclusively in a double-stranded form. It is the ‘double helix’. In contrast RNA molecules are predominantly single stranded, although this does not preclude the formation of double-stranded regions depending on base-pairing within a single strand. Chemically RNA and DNA differ principally in the nature of the sugar in backbone. The RNA sugar is ribose (hence RiboNucleic Acid), whilst that in DNA is deoxyribose (hence DeoxyriboNucleic Acid). This difference confers important different chemical and physical properties on the two nucleic acids, differences that allow DNA to function more efficiently as a genetic information store.
By itself a description of the components of biological systems fails to capture the intrinsic nature of the process of life. The key to understanding life is that it is dynamic. It is not an ordered crystal structure but rather should be regarded as a system in constant flux where an elaborate organisation of multifarious chemical reactions is involved in maintaining individual organisms and enabling their replication for future generations. This concept of energy flux is again attributable to Lotka. Replication is dependent on the conservation of information within any biological system but, by the same logic, can allow small informational changes to determine changes in the characteristics of the chemical reactions and so direct the evolution of biological forms. In this context RNA initially had two essential attributes. Not only could it carry information in the form of a defined sequence of nucleotides, but it could also, by virtue of its chemical structure, catalyse a (rather restricted) set of chemical reactions. So by analogy to protein catalysts, aka enzymes, it could act itself as an enzyme, and indeed some RNA molecules still perform biologically crucial such functions in cells. Indeed the synthesis on the ribosome of the peptide bonds connecting individual amino acids in a protein chain is RNA-catalysed. That such an important feature of cellular information flow is still extant is one of the major indications that RNA catalysis is evolutionary ancient and likely developed in a world devoid of DNA.
In principle, given appropriate precursor chemicals, RNA can maintain itself by self-catalysed copying and processing. Such a process is likely inefficient. With the advent of the ability to synthesise protein molecules – even simple ones – the biological world would be transformed. Instead of the four basic structural units – the bases – in RNA, present-day proteins can contain up to at least 20 different fundamental units – as amino acids – in a polypeptide chain. Consequently not only is the repertoire of chemical reactions that can be catalysed by proteins very much greater than that of RNA molecules but also protein molecules either individually, or in collaboration with each other, can construct a scaffold that facilitates the close approaches of the chemical participants in a reaction. Or put another way, proteins can act to increase the local concentrations of reactants not only by possessing catalytic properties themselves – this is, after all, an essential attribute of enzymatic catalysis – but also by stabilising the structures of other catalytic macromolecules – be they RNA or protein – to effect the close spatial proximity of chemically reactive groups.
Ultimately all these chemical reactions that build an organism require energy (Lane, Allen, & Martin, Reference Lane, Allen and Martin2010). But how can the process of life be reconciled with physical laws? At the heart of this question lies the apparent paradox first broached by the eminent physicist Erwin Schrödinger nearly 70 years ago. Like Boltzmann and Lotka before him, he addressed the fundamental issue of the nature of life itself as seen through the lens of a physicist. How is it that although Boltzmann’s Second Law of Thermodynamics (Box 1.4) dictates that the universe ultimately approaches a state of maximum disorder, or entropy, that biological systems, as we observe them, appear to be, at the very least, minimising disorder or creating a system within themselves where order is actually increasing – or in Schrödinger’s terminology – creating negentropy (see also Brillouin, Reference Brillouin1953, Reference Brillouin1956) The key to this apparent paradox is that biology operates in a thermodynamically ‘open’ system in which any local decrease or minimisation of entropy is more than compensated by an entropy increase elsewhere so that, on balance, overall entropy increases. In other words, because overall the system is heterogeneous, it is energetically possible for different parts of a connected system to gain or lose entropy, provided that there is no net loss of entropy. Indeed, because the energy required to reduce entropy locally is not utilised with 100% efficiency, there must overall be an increase in entropy. In this context what is important is Boltzmann’s ‘available energy’ and not the overall energy content of a system. Energy is useless if it cannot be utilised. A tautology maybe but as Boltzmann put it in defining available energy:
The general struggle for existence of animate beings is not a struggle for raw materials – these, for organisms, are air, water and soil, all abundantly available – nor for energy which exists in plenty in any body in the form of heat, but a struggle for [negative] entropy, which becomes available through the transition of energy from the hot sun to the cold earth.
The Second Law of Thermodynamics concerns processes that involve the transfer or conversion of heat energy. It posits that because such processes are not 100% efficient, some energy is ‘wasted’ and therefore a system progresses in the direction of increasing disorder, also known as entropy. This statement implies irreversibility and is the basis for the ‘arrow of time’. There are many varied formulations of the Law. That of Planck states, ‘Every process occurring in nature proceeds in the sense in which the sum of the entropies of all bodies taking part in the process is increased. In the limit, i.e., for [ideal] reversible processes, the sum of the entropies remains unchanged’. In the context of the Second Law, living organisms are never in states of thermodynamic equilibrium and must be considered as ‘open’ systems because they take in nutrients and give out waste products. A biological system is not reversible but operates within a non-equilibrium ‘open’ thermodynamic environment. Nevertheless, to satisfy the requirement of the Second Law that entropy increases as energy, there is, overall, a net increase in the system comprising an organism and the total environment in which it operates. A further crucial characteristic of the Second Law is that, as discussed in Chapter 2, it is statistical and thus probabilistic in nature. The Second Law has been, and to some extent still is, scientifically controversial, but as Arthur Eddington once said,
The law that entropy always increases, holds, I think, the supreme position among the laws of Nature. If someone points out to you that your pet theory of the universe is in disagreement with Maxwell’s equations – then so much the worse for Maxwell’s equations. If it is found to be contradicted by observation – well, these experimentalists do bungle things sometimes. But if your theory is found to be against the Second Law of Thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation.
Schrödinger (1944) points out very clearly that this statement is best appreciated in thermodynamic terms by invoking the technical definition of ‘free energy’ or, as he puts it, ‘[considering entropy alone’ cannot account for [a biological system which feeds] on matter in the extremely well-ordered state of more or less complicated organic molecules]. But this balance differs between organisms. It is true for animals but much less so for plants and of course, both are part of the same complex system.
A fundamental question is then what is the nature of the molecular mechanisms that drive the creation of negentropy and the establishment of organisation? The creation of negentropy, otherwise a reduction in the intrinsic entropy of a system, implies increased order, a property of complex systems. But what do we understand by complexity? Complexity can be a slippery concept, distinct from, but related to, diversity but here is defined as a physical system containing a number of distinct components that interact directly or indirectly (Box 1.5). These components usually constitute a network. Diversity, a necessary condition for complexity, simply specifies the number of components without regard to their ability to interact. It is effectively a scalar property while by analogy complexity has some of the attributes of a vector. Although life is arguably one of the most extreme examples of a complex system, there are many other examples of such a phenomenon, for example, a galaxy cluster, an atom or even a city. In these cases available energy is used to create organisation in an open thermodynamic system. For a city the available energy would have been provided in early times by the agricultural harvesting of light energy and more latterly by fossil fuels.
A succinct and relevant definition of complexity and complex states has been provided by Chaisson (Reference Chaisson2015) – ‘a state of intricacy, complication, variety or involvement among the networked, interacting parts of a system’s structure and function; operationally, a measure of the information needed to describe a system, or of the rate of energy flowing through a system of given mass’. This definition emphasises not only that interactions between different parts of a system are a core component of complexity but also that the system can be described by information or by energy flux, as originally postulated by Lotka. For biological systems, at least, the intimate connection between information and complexity is discussed in Chapter 2. A related concept is that of ‘complexification’ (see Huxley in de Chardin, Reference Dawkins1959), visualised as a process that is ‘accompanied by an increase in energetic tension in the resultant corpuscular organisation, or individualised constructions of increased organisational complexity’. For further discussion of the concept, see Adami (Reference Adami2002; Adami, Ofria, & Collier, 2000).
Non-biological examples of complex systems share characteristics with biological systems. A simple example is an atom, composed of three different interacting sub-atomic particles – electrons, protons and neutrons – which like biological systems, form a dynamic structure. But this picture represents only one level of complexity. In the atomic nucleus both protons and neutrons are themselves made up of a mixture of three other sub-atomic particles – quarks. The precise flavour of the mixture determines whether the nucleon is a proton or a neutron. And again, the strong force between nucleons is mediated by apparently massless particles termed gluons. Like atoms, most complex systems can be considered to be a hierarchy of different levels of complexity.
Diversity, as the word implies, is a measure of difference and is shorthand for the number of distinguishable components in a system – for example, an ecosystem. Commonly used in the context of biodiversity as popularised by E. O. Wilson (1992), it is often broadly interpreted to include interactions between components and therefore complexity. Nevertheless, although it does not by itself imply interactions, diversity is the basis for complexity.
The evolution of complexity in biological systems was also highlighted in Erwin Schrödinger’s What Is Life? In his seminal lectures in 1943, he claimed that the laws of heredity required that the genetic material must contain a ‘code-script’ that determined ‘the entire pattern of the individual’s future development and of its functioning in the mature state’. In other words, the individual’s and by extension the biological system is specified by ‘information’. At that time the nature of this code-script was obscure. Schrödinger appreciated that the encoding molecules must contain a non-repetitive molecular structure, but the experiments characterising the molecules that carried genetic information were only in their infancy – and in fact the first, from Oswald Avery and his collaborators implicating DNA as the responsible molecule, would not be published until the following year. Nevertheless, the concept that biological information is encoded chemically provided an essential link between the heritability and the thermodynamics of living biological systems. This mirrored Boltzmann’s implied view that available energy could be the central concept that unified physics and biology as a quantitative physical principle of evolution.
This link is indeed central to the mechanism of the evolution of biological systems. In this context the overriding concept of natural selection, although usually discussed in relation to organisms, actually applies to the molecules that specify by their function the phenotype of an organism. At one level it is the thermodynamic characteristics of the molecules, particularly the macromolecules, that determine the properties of a biological system as a whole – however that system is defined. Most biological macromolecules, be they DNA, RNA or proteins, are long polymeric chains whose length is very much greater than their width. Superficially they may be compared to long lengths of flexible string such that the length of an individual molecule can be accommodated by many different pathways in three dimensions. It is this variety of pathways – more technically, configurations – that is one component of the intrinsic entropy – or disorder – of the molecule. Some of these long polymers, particularly RNA and proteins, perform structural roles. Others catalyse chemical reactions. Yet others combine these roles. But for those that catalyse chemical reactions, the close approach of the chemical groups required for catalysis within the molecule is essential. And because these groups have, in general, to be much closer to each other than the corresponding lengths along the backbone of the polymer only a few configurations out of a multitude will enable efficient catalysis. Selection for efficient reaction rate will thus inevitably result in a tightening of the frequency spectrum of the range of available configurations and so in a reduction of the intrinsic entropy.
On this view natural selection acts to confine macromolecules to particular preferred structures – it can act, for example, to reduce the wriggle rate of a moderately flexible polymeric string. In this context the formal description of such a string as a ‘worm-like chain’ is particularly apposite. This can be accomplished in several ways – by altering the sequence of the individual units in the polymer and so the immediate environment of the chemically reactive centre or, more frequently, stabilisation may be the result of two, or more, long polymers, binding to each other so reducing the wriggle rate of both. In this second situation, not only is the intrinsic entropy of both reduced but also the number of components involved is increased. Or, put another way, the complexity of the system is increased. Simple examples like this imply that natural selection can increase not only the apparent order in a biological system but also its complexity. To what extent this perspective can be generalised to encompass the huge diversity of the biological world that is apparent to us as an everyday phenomenon is a subject that will be explored later.
But for natural selection to act in this way, not only must a particular molecular variant be favoured for, say, an enhanced rate of a required reaction but also that variant must be replicated. It is here that the role of Schrödinger’s ‘code-script’ is paramount. The ‘code-script’ defines and stores the information that specifies the sequence of macromolecules such as RNA and proteins. Initially in an RNA world RNA could perform both functions. It can act as an informational store and can replicate. But the information-carrying capacity of RNA, like that of the genetic code itself, has limitations. These limitations reflect the physical characteristics of the molecules involved. A simple analogy may be made with computing. In the early days of ‘powerful’ computers the information necessary to process experimental data – for example, generated by X-ray crystallography – was not stored first on a memory stick or even a hard disc – but instead on flimsy punched tapes. To transport the data from an X-ray run to calculations on the central computer on the Downing site in Cambridge required two people (Figure 1.1).
Figure 1.1 A low-tech method of information transfer. Punched computer tape – the output from an X-ray crystallographic study of the muscle protein, myoglobin – being transported outside The Arts School, Cambridge in 1958 by Bror Strandberg and Richard Dickerson.
An improvement in technology led to the introduction of punched cards, but because in Cambridge the central university computer had reached the limits of its calculating capacity, the punched cards had to be stacked on shop trolleys and taken on the train to London to be processed on more powerful computers. These forms of information storage worked because there was nothing better, but in reality they were fragile, and the amount of information that could be stored on a particular structural unit – be it a card or a tape – was small. In the biological realm RNA is likewise more chemically fragile than DNA and, possibly because of their many different roles in information transfer, it is less feasible for RNA molecules to accommodate as much genetic information as the longer DNA molecules.
Another limitation of information storage is the ability to make optimal use of the potential informational capacity of a coding molecule. RNA molecules today contain predominantly four nucleotide bases, A, C, G and U. There are several scenarios for the evolution of such a heteropolymer, but in the simplest case it’s possible to consider a primeval RNA molecule containing a heteropolymeric mix of all four bases. However, in such a molecule the ability to read a sequence to form a protein then depends on the stability of the interactions of the decoding reading heads with the base sequence itself. Initially only the most stable interactions would likely be accessible, leaving other information potential in the RNA molecule essentially unreadable, or effectively nonsense. But as more and more stable reading head interactions became accessible, so the amount of nonsense would decrease. Put another way, the information storage in an RNA molecule would become more efficient, and the predictability of the reading process would increase. This is directly related to a quantity termed Shannon entropy, a central tenet of information theory put forward by Claude Shannon in Reference Shannon1948. In this theory an increase in predictability decreases Shannon entropy. As foreseen and paraphrased by Gilbert Lewis, an eminent chemist and discoverer of the covalent bond, discussing chemical entropy: Gain in entropy always means loss of information, and nothing more.
There are further parallels with the nature of information encoded in DNA with that encoded by computers. As pointed out by John von Neumann (1958), in computing there are two modes of specifying information – analogue and digital. In the analogue mode the information is expressed in principle as a continuous variable – the classic clock face is an everyday example, while Charles Babbage’s Analytical Machine, one of the first real computers, used an arguably analogue mode. In contrast digital information is stored in an essentially discontinuous fashion. Again digital clocks are an excellent everyday example, and indeed most modern computers store information in a digital mode. DNA, however, combines analogue and digital modes (Marr, 2008) and encodes different types of information in each mode. For example, the encoding of codons specifying protein and RNA molecules is purely digital, while the specification of the physical properties of DNA – how easily it bends or the strands separate – is the ensemble of the physicochemical properties of a more extended DNA sequence. Simplistically the digital information determines the nature of the macromolecules in a cell, while the analogue information determines the regulation of the production of the macromolecules. However because both the analogue and digital information are encoded by the sequence of four bases in DNA, they are not independent, especially, for example, where the specifying sequences overlap. Arguably it is the ability of DNA to combine these two types of information that enables it to act as such an efficient genetic repository. There is another biological information processing system that combines analogue and digital characteristics. The mathematician John von Neumann pointed out that within a single neuron a nerve impulse is essentially digital in character. A neuron either fires or it doesn’t. However the transmission of the impulse signal to a neighbouring nerve cell via a synapse depends on the concentrations of certain small molecules – neurotransmitters – in the small gap between the neurons. In this case the transmission has an analogue character because concentration is a continuous variable.
Although combining analogue and digital information in the same molecule may place restrictions on the usage of both components, the use of both modes in principle increases the information carrying capacity of the molecule. If you like, the DNA cable has two channels rather than one, and so again, in a biological system, the utilisable information is increased. In his original proposal Schrödinger framed his argument in terms of classical thermodynamics – that is, the entropy in his definition referred to the classical Boltzmann entropy. However, importantly, the information content of the code-script is a direct reflection of the complexity of the operating system. The more information that is encoded, the better the system is defined. Or put another way, biological mechanisms are selected so that, as far as possible, nothing is left to chance within the physical constraints acting on the system. An accumulation of information in this way thus implies a lower Boltzmann entropy or ‘negentropy’, and the organisation of information within the ‘code-script’ itself contributes to negentropy. The nature of the information in the ‘code-script’ – be it RNA or DNA – is a primary vindication of Schrödinger’s hypothesis.
The Transition from an RNA World to a DNA World: A Major Increase in Complexity and Information
The supplanting of RNA by DNA as the major information store in biological systems is a prime example of an increase in complexity. However in present-day biological systems, RNA still acts as a messenger molecule (mRNA) for directing the synthesis of proteins, still in the form of tRNA acts as an adaptor for decoding the nucleotide sequence in mRNA into a specific amino acid sequence in proteins, and still in the ribosome acts as a structural and catalytic component of information flow. RNA molecules, in different forms, have thus retained, not necessarily in their original guise, many of the essential features of information flow that would have been essential in a biological world lacking DNA. Conversely, because DNA contains deoxyribose in place of ribose in its sugar-phosphate backbone, the potential of a DNA molecule for catalysis relative to RNA is much diminished. What cellular RNA molecules – strictly RNA molecules encoded by DNA – have lost is the ability to act as a template directly for their own replication. As the molecule conferring heritable properties DNA, it is the DNA sequences that are replicated so that any heritable changes – mutations – in the sequence can be passed on to future generations.
The consequences of the increase in nucleic acid complexity from RNA only to RNA plus DNA would have been a game-changer. In essence the existence of two fundamental types of nucleic acids with different functions would require, as an absolute necessity, some form of communication – interaction – between them. Not only would the sequences of the different RNA molecules have to be encoded in the DNA but also because within a cell the abundance of specific RNA molecules can differ by up to several thousand-fold the DNA must contain information specifying the number of RNA molecules to be synthesised. For example in the simple bacterium Escherichia coli, there may be of the order of 20,000 of each of the three RNA components of the ribosome, while some mRNA molecules may be present in less than 10 molecules per bacterium. Indeed, some mRNA molecules may only be synthesised when the organism is exposed to very different environmental circumstances. So not only must the DNA contain the information specifying the quantity of each RNA molecule required by the cell, but it could also contain information for varying – regulating – the quantities required depending on the prevailing internal and external environmental conditions.
In this DNA world DNA and RNA molecules are not stand-alone entities. A network of interactions is essential for the efficient operation of information flow. This is not simply an increase in the diversity of the different nucleic acid molecules. It is an increase in the complexity of the biological system. One consequence of an increase in complexity is the possibility of the development of new functional modes that were precluded in a simpler system. Such novel ‘emergent properties’ reflect an increasing self-organisation. In a transition from RNA alone to RNA plus DNA any constraints imposed by the limitations of an RNA genome might by relieved by a separation of function. One such possible example of an emergent function is the ability of DNA molecules to adopt a higher energy state in which the long DNA chain is coiled – known as a supercoil because the DNA double helix is itself a coil. In this form the DNA stores energy, is more easily packaged into a small volume and creates a situation in which the activity of one gene on a DNA molecule can influence the activity of its neighbour. Supercoiling requires that the rotation of a DNA molecule about its length be restricted – most easily accomplished by having a circular DNA molecule or attaching the two ends of a linear DNA molecule to a fixed scaffold. The dual roles of DNA as an information repository and an energy store uniquely represent the twin underpinnings of a low-entropy complex system – information coupled with efficient energy utilisation.
More importantly, not only does DNA store information but that information is mutable, enabling modification of the phenotype and selection for more adapted characteristics of an organism in a changed environment. Critically the ‘code-script’ is also highly dynamic. The ability of complementary DNA strands to pair with each other over limited regions not only facilitates the shuffling of genes during the formation of germ cells but also enables short DNA segments – so-called jumping genes or mobile DNA elements – to move from place to place in the DNA genome, essentially acting as an intrinsic mutagen. Indeed, the ubiquity of jumping DNA implies that it is maintained by strong selective forces. These could reflect simply the presumed ‘selfish’ character of such DNA elements or, alternatively, the ability of such elements to create mutations in the genome that could act as an evolutionary driver.
The acquisition of emergent properties by an increase in complexity from an RNA-only world does not necessarily imply that a double-stranded RNA genome able, for example, to support supercoiling could not exist and that a different means of regulating information flow with RNA alone is not theoretically possible. As with the evolution of organisms, the evolution of biological molecular systems depends on previous history as well as circumstance. In this sense it is said to be Bayesian (Box 1.6) in character. Perhaps the selective advantages conferred by DNA would, if reproduced in RNA, be similarly effective. That is, without the appearance of DNA, the functionality of double-stranded RNA might have evolved to adopt roles analogous to those inherent in DNA. Whatever the earliest steps in evolution, once DNA became established as a separate functioning entity, its apparent intrinsic advantages as a genomic information repository would lead to its present dominating position.
Thomas Bayes (c. 1701–1761) was an English clergyman who took a deep interest in probability theory. His principal accomplishment – published posthumously – is known as Bayes’ theorem and describes the probability of an event based on prior knowledge of conditions that might be related to the event. The theorem was subsequently refined and popularised by the French mathematician Pierre-Simon Laplace (1749–1827). Biological evolution perfectly exemplifies Bayesian logic because by changing – usually substantially increasing – the probabilities of particular interactions at the expense of others so that the probable pathways for future evolution are severely restricted. In this context Dawkins’ illustrative example of the implausibility of the appearance of ‘crocoducks’ is well worth reading (Dawkins, 2009). The common ancestor of crocodiles and ducks substantially predates both, precluding the evolution of the one into the other.
Information and Complexity
By recognising the informational role of the ‘code-script’, Schrödinger coupled the concepts of the information contained within a DNA sequence and the complexity of living systems. But, as an informational carrier, DNA is restricted by genetic transmission and arguably is not the sole source of information that currently drives the evolution of some complex biological systems. Complex mammalian societies – not just human ones – can, and do, transmit information utilising ‘cultural’ modes that do not directly require DNA transmission (Chaisson, Reference Chaisson, Lineweaver, Davies and Ruse2013). Indeed, it can be argued that the complexity of a society or polity is dependent on the amount of cultural information available (Tainter, Reference Tainter1988). Are the physical consequences of this form of information transmission related to the ‘negentropy’ postulated by Schrödinger? If so, the implication is that the development of complexity in biological systems is directed primarily by information input and is itself adapting to inputs from different sources. Put another way, the physical phenomenon of increasing complexity in biology represents the evolution of a seamless transition from a purely DNA-based information system to one utilising additional sources of information. Such evolution is naturally conditional on an adequate supply of energy.
Throughout the passage of time biological diversity and complexity have increased with only transient interruptions occasioned by geological or astronomical interventions (Bonner, 1988; Wilson, 1992). This observation raises two related issues. Not only does the increasing immensity of the complexity that we now perceive appear inherently improbable (or even to some implausible), but also why is the process apparently directional? An illuminating perspective on the aspect of probability was provided by Gilbert Lewis (Reference Lewis1926) again:
Borel makes the amusing supposition of a million monkeys allowed to play upon the keys of a million typewriters. What is the chance that this wanton activity should reproduce exactly all of the volumes which are contained in the library of the British Museum? It certainly is not a large chance, but it may be roughly calculated, and proves in fact to be considerably larger than the chance that a mixture of oxygen and nitrogen will separate into the two pure constituents. After we have learned to estimate such minute chances, and after we have overcome our fear of numbers which are very much larger or very much smaller than those ordinarily employed, we might proceed to calculate the chance of still more extraordinary occurrences, and even have the boldness to regard the living cell as a result of random arrangement and rearrangement of its atoms. However, we cannot but feel that this would be carrying extrapolation too far. This feeling is due not merely to a recognition of the enormous complexity of living tissue but to the conviction that the whole trend of life, the whole process of building up more and more diverse and complex structures, which we call evolution, is the very opposite of that which we might expect from the laws of chance.
But if an evolving complex system is improbable, is an even more complex system even more improbable? The answer to this question is likely no, a conclusion rooted in fundamental thermodynamic properties of complex systems. The biological world we know is a superb example of a highly efficient mechanism for utilising its principal energy source – sunlight. In this context it is representative of an organised low entropy system associated with high exploitability and control of energy. Conversely a high entropy system is associated with low exploitability. Therefore natural Darwinian selection will inevitably select for more complex and information-rich systems with more regulation provided that there is sufficient energy available. Of course if energy is limiting, so also is the ultimate complexity attained. But within these limits the evolution of biological complexity becomes a positive feedback process and is consequently directional.