From covalent transition states in chemistry to noncovalent in biology: from β- to Φ-value analysis of protein folding

Abstract Solving the mechanism of a chemical reaction requires determining the structures of all the ground states on the pathway and the elusive transition states linking them. 2024 is the centenary of Brønsted’s landmark paper that introduced the β-value and structure-activity studies as the only experimental means to infer the structures of transition states. It involves making systematic small changes in the covalent structure of the reactants and analysing changes in activation and equilibrium-free energies. Protein engineering was introduced for an analogous procedure, Φ-value analysis, to analyse the noncovalent interactions in proteins central to biological chemistry. The methodology was developed first by analysing noncovalent interactions in transition states in enzyme catalysis. The mature procedure was then applied to study transition states in the pathway of protein folding – ‘part (b) of the protein folding problem’. This review describes the development of Φ-value analysis of transition states and compares and contrasts the interpretation of β- and Φ-values and their limitations. Φ-analysis afforded the first description of transition states in protein folding at the level of individual residues. It revealed the nucleation-condensation folding mechanism of protein domains with the transition state as an expanded, distorted native structure, containing little fully formed secondary structure but many weak tertiary interactions. A spectrum of transition states with various degrees of structural polarisation was then uncovered that spanned from nucleation-condensation to the framework mechanism of fully formed secondary structure. Φ-analysis revealed how movement of the expanded transition state on an energy landscape accommodates the transition from framework to nucleation-condensation mechanisms with a malleability of structure as a unifying feature of folding mechanisms. Such movement follows the rubric of analysis of classical covalent chemical mechanisms that began with Brønsted. Φ-values are used to benchmark computer simulation, and Φ and simulation combine to describe folding pathways at atomic resolution.


Introduction
I have been fascinated with transition states for more than 60 yearsa passion for understanding structure and mechanism which has directed my research at the borderlines of chemistry, physics and biology.Transition states of simple covalent reactions are traditionally studied by structureactivity relationships whereby perturbations of the energetics of kinetics and equilibria of reactions on small changes in the structure of reagents are correlated to give clues about the structure of the transition state.Much of biological chemistry is dominated by weak noncovalent interactions, especially those of proteins.The advent of protein engineering enabled structureactivity relationships to be applied to the noncovalent transition states of those biological processes.This invited review outlines the history of key steps by my research group and by others in translating those structure-activity methods of classical physical and organic chemistry to analyse noncovalent transition states.It begins with their introduction via protein engineering to the quantitative study of noncovalent interactions in enzyme catalysis and specificity and then their extension to protein folding to give Φ-value analysis.I discuss in particular how the combination of those methods and computer simulation has been used in solving problems of protein folding pathways.
It is a particularly appropriate time for this topic as it is the centenary of the publication of the landmark paper in the history of physical-organic chemistry that led to structure-activity studies, the discovery of general-base catalysis and its dependence on the strength of the base by Brønsted and Pedersen (1924).That discovery and the ensuing Brønsted β-value have inspired much of my research and the contents of this review.It is also the half-centenary of my paper that sent me down the slippery slope of analysing non-covalent interactions in transition states (Fersht, 1974).Pertinent also it is the centenary of the chess grandmaster S. G. Tartakower's 'Die Hypermoderne Schachpartie' in which he wrote 'Die Fehler sind dazu da, um gemacht zu warden' (Tartakower, 1924, p. 90).The usual translation 'The mistakes are all there, waiting to be made' should be the watchword of every experimentalist and theoretician as well as chess player, especially in areas as complex and with pitfalls as protein folding.

Transition states in covalent chemistry
Transition states are the transient structures at the peaks of plots of free energy as a reaction progresses as opposed to intermediates that are in a basin (Figure 1).Simple transition state theory relates the rate constant for a reaction to the energy difference between the transition and ground states, ΔG ‡ , as if the two states were in equilibrium: the rate constant for the reaction going through the transition state, k, is given by: where: k B is the Boltzmann, h is the Planck, R is the gas constants, T is the temperature, and κ is a transmission coefficient (Pelzer and Wigner, 1932;Evans and Polanyi, 1935;Eyring, 1935).Examination of the transition state structure relative to the ground states gives important clues as to what drives a reaction and how its rate or even its products may change by altering the structure of the reagents, the reaction conditions or employing catalysts.For example, the rate of attack of a negatively charged nucleophile on a reagent can be increased by introducing electron-withdrawing substituents.Transition states are essential structures in defining reaction pathways.To solve a reaction pathway, we must characterise all the ground states and the transition states linking them.
Ground states and intermediates are best studied by direct observation.The only state between ground states that can be characterised experimentally is the elusive transition state and the only current experimental means is by using indirect evidence from structure-reactivity relationships.
Linear-free-energy relationships: LFER and REFERs -βand α-values The classical physical-organic chemist's approach to analysing the structure of a transition state of a reaction is to use quantitative measurements of the changes in reactivity and equilibria on small changes in the structure of reagents.For example, Brønsted and Pedersen began the analysis of the effects of strengths of bases and acids on their powers of catalysis of simple organic reactions in solution (Brønsted and Pedersen, 1924).They found, for example, that there is often a simple equation relating the second-order rate constant (k 2 ) for catalysis of a reaction by a general base to the pK a of the conjugate acid (2).
log k 2 = A + βpK a : (2) This is an example of a linear-free-energy relationship (LFER) since it is equivalent to: where ΔG ‡ is the free energy of activation and ΔG 0 the equilibrium free energy change of a process.βis for base but we usually call it the Brønsted β.The equation can be formulated for a wider range of reactions, ΔG ‡ = A + αΔG 0 , as described by Leffler (1953), and the description rate-equilibrium-energy relationship (REFER) alternatively used.L. P. Hammett translated these LFERs to chemical reactions involving aromatic compounds by measuring the effects of chemical substituents in the meta and para positions of benzoic acid on its pK a to assign a σ-value for each substituent (corresponding to the change it makes in the pK a ) and relating the sensitivity of the logarithms of rate constants for chemical reactions to σ by a parameter ρ , equivalent to the Brønsted β (Hammett, 1937).The meta and para positions are chosen to minimise direct steric interactions with the seat of reaction (Hammett, 1940).
The simple reasoning behind the magnitude of the β and ρ values in many chemical reactions is that they often result from electrostatic effects.For example, in the transition state of the general-base-catalysed attack by acetate ion of H 2 O on an ester (Figure 2), an H + is in the process of being transferred from the H 2 O to the ÀCO À 2 catalyst, partly neutralising its negative charge.If a substituent that has an electron-withdrawing or donating propensity is put into the ÀCH 3 of acetic acid, it will perturb its pK a by ΔΔG 0 because of the electrostatic interactions with the negatively charged carboxylate relative to the neutral state.The electrostatic interaction of the substituent with the partly neutralised negative charge on the ÀCO À 2 in the transition state, ΔΔG ‡ , will be less than ΔΔG 0 because of the H + being transferred so that: where β approximates to the extent of bond formation with the H + in the example of Eq. ( 4) or in other cases a covalent bond in the transition state.β = 0 means there is no transfer of the proton to the base and β = 1 means complete transfer, and fractional values are something in between.One possible generic basis of LFERs is explained in Figure 3 where the reagents are in two energy wells that  intersect at the transition state.Applying a simplified version of the treatment by Marcus (1968) of outer sphere electron transfer reactions, I assume the energy functions are simple harmonic wells.
For the starting material S, ΔG S = λ 1 r 2 and for products ΔG P = λ 2 1 À r ð Þ 2 À ΔG 0 , which gives for α = ΔΔG ‡ =ΔΔG 0 (Fersht, 2004b): For the special case of λ 1 = λ 2 , a = r ‡ .But, apart from the extreme values of the position of the transition state r ‡ = 0 or 1, r ‡ does not generally = α (or β) (Fersht, 2004b).The situation is, of course, even more complicated than the above for fractional values.The reaction coordinate diagram is not two-dimensional and there can be movement in other dimensions with much complexity (Jencks, 1985).
LFERs have been found in many types of physical-chemical processes, and the interpretation is usually simply phenomenological with the value of α or β interpreted only qualitatively for mechanism and semi-quantitatively for predictive purposes.In the qualitative analysis of the effects of changes of structure on reactivity, it is just the changes in ΔG ‡ in the κ k B T=h ð Þexp ÀΔG ‡ =RT ð Þ term of the transition state theory that are examined, and the preexponential component cancelling out in the comparison of rate constants.ΔΔG ‡ and ΔΔG 0 are the key quantities.My first published paper as a graduate student centred on using LFERs to analyse transition states in chemical mechanisms, with a series of substituted aspirins (Fersht and Kirby, 1967), and LFERs figure prominently in my textbook on enzymes (Fersht, 1977(Fersht, , 1985)).

Transition states in noncovalent chemistry: biological catalysis and specificity
Classical chemistry is dominated by covalent bonds and strong ionic interactions.Much of chemistry in biology, on the other hand, is dominated by weak noncovalent interactions, such as van der Waals interactions, hydrogen bonds, salt bridges, and the hydrophobic effect.Utilisation of these weak interactions is the hallmark of biological specificity in general and modulation of catalysis by enzymes.

Enzyme catalysis and binding of the transition state
The rates of enzyme-catalysed reactions are many orders of magnitude greater than simple reactions catalysed in solution by acids and bases or nucleophiles.To answer why, Haldane proposed that enzymes might catalyse reactions by straining the structures of the substrates towards that of the products (Haldane, 1930).Pauling refined that concept by stating that an enzyme could have a structure complementary to that of the activated complex or transition state of the substrate, and hence stabilise it (Pauling, 1948).Classical studies varying the structures of substrates of α-chymotrypsin, for example, showed that binding energy could be distributed between tighter binding of substrate and higher rate constants (Jencks, 1975).Analogues mimicking the structure of transition states of substrates may also bind more tightly than the substrates themselves (Schramm, 1998).So, free energies of activation of the covalent chemical reaction, ΔG ‡ cov , can be modulated by changes in binding energies, ΔG ‡ noncov .The Michaelis-Menten equation ( 6) relates the reaction rate v of a substrate S to the total concentration of enzyme, [E] 0 , an apparent firstorder rate constant k cat , and an apparent dissociation constant In the simplest case, K M is the dissociation constant for the E.S complex, K s , and k cat is the rate constant for its giving products.But, these apparent rate and equilibrium constants can hide a complexity of additional terms, from additional chemical steps to nonproductive binding.Crucially, however, the ratio k cat /K M is an apparent second-order rate constant for the process of free enzyme, [E], and free substrate, [S] proceeding to the highest transition state on the reaction pathway to give products, and complicating factors are usually cancelled in the ratio k cat /K M , Eq. ( 7).
Applying simple transition state theory suggests two notional processes in the evolution of maximal rate (Fersht, 1974).The enzyme evolves to have a structure that is complementary to that of the transition state of the reaction, which maximises the value of k cat /K M .And, if rate is the prime concern, the enzyme will also evolve to increase K M at constant k cat /K M until the K M is higher than the physiological substrate concentration.This is because lowenergy intermediates can be thermodynamic pits where there is a higher ΔG ‡ going from them to the transition state than there is from the initial state.The strain theories of Haldane and Pauling propose strong binding of the transition state and concomitant weak binding of the substrate, and the highest catalysis occurs when the binding energy in the E.S complex is sufficiently weak such that it is the complex is largely dissociated and intermediates do not accumulate on reaction pathways (Fersht, 1974).

Specificity depends on the relative binding of transition states
When two substrates A and B are competing for the active site of an enzyme, their relative rate of reaction at all concentrations of free [A] and [B] is given by (Fersht, 1974): Figure 3. Illustration of one type of origin for a LFER.In the plot of G versus reaction coordinate, r, the energy function of the starting material S crosses that of the products P at the transition state.To an approximation, if the structure and energetics are perturbed such the energy of P is increased relatively by ΔΔG 0 to S, the energy of the transition state will be increased by a value of ΔΔG ‡ that is less than ΔΔG 0 and determined by the angles and so forth at the point of intersection.Apart from the extreme values of the position of the transition r ‡ = 0 or 1, r ‡ does not generally = ΔΔG ‡ =ΔΔG 0 that is, ≠ α or β (Fersht, 2004b).The small change in r ‡ with changes in energetics is the basis of the Hammond Postulate (Hammond, 1955) whereby as the energy of the high energy state increases, the transition state structure moves closer to it.

Quarterly Reviews of Biophysics
As k cat /K M is for the process of unbound enzyme and unbound substrate proceeding to the transition state ES ‡ , the specificity is independent of the interactions in the enzyme-substrate complex and depends only on the relative binding of transition states.Accordingly, both the magnitude and specificity of enzyme catalysis depends upon the binding of transition states.
Equation ( 8) is very useful for measuring the apparent contributions to binding energy of parts of substrates by comparing modified versions of them.For example, a substrate, containing a particular radical can be compared with the substrate modified to have, say, an -H replacing that radical to give an empirical measure of the energetics of binding of that radical.The aminoacyl-tRNA synthetases have evolved to maximise the specificity of competing amino acids, for example, the isoleucyl-tRNA synthetase with isoleucine versus valine.We measured ratios of k cat /K M for cognate versus non-cognate amino acids with different aminoacyl-tRNA synthetases to explore the upper limits of binding energies under evolutionary pressure (Fersht, 1981).

Noncovalent interactions in enzyme transition states: LFER analysis
We would like to know how the structures of proteins change in the transition states of biological processes and how it contributes to them.The way experimentally to characterise those details by analogy with covalent chemistry is by using similar systematic structure-reactivity relationships, which is something I had been wanting to do since starting in enzymology.The introduction of site-directed mutagenesis at the end of the 1970s to revert mutants of bacteriophage φX174 (Hutchison et al., 1978) made this possible and laid open the new field of protein engineering, which was left largely unploughed for 4 or 5 years.
The initial paradigm: protein engineering the tyrosyl-tRNA synthetase Gregory Winter and I began a collaboration and published the first paper on protein engineering studies on a protein of known structure (Winter et al., 1982).It may seem surprising that the practical application of the mutagenesis technology of the 1978 paper (Hutchison et al., 1978) took so long.Site-directed mutagenesis was then very difficult to do on the genes of recombinant proteins; the necessary oligonucleotides were not commercially available; only a few protein chemists were using recombinant DNA technology; and some did not believe that site-directed mutagenesis was anything more than a new form of chemical modification (reported by Bryan, 2000).I spent a sabbatical in [1978][1979] in Arthur Kornberg's laboratory to learn recombinant DNA technology and worked on reverting mutants of φX174 to study the fidelity of DNA replication (Fersht, 1979).Gregory Winter had sequenced the genes of aminoacyl-tRNA synthetases, and we chose to do protein engineering of the tyrosyl-tRNA synthetase from Bacillus stearothermophilus.His goal was to use it as an entry into making novel proteins, paralleling synthetic organic chemistry, and he subsequently pioneered antibody engineering.My goal was to use it for structureactivity studies to understand the chemistry of noncovalent interactions in biology, paralleling physical-organic chemistry.This thermophilic enzyme is an exceptional paradigm for this latter purpose: it may be expressed in Escherichia coli, and any activity of contaminating mesophilic enzyme that could obscure steadystate kinetics removed by heating; it is amenable to study by presteady kinetics so intermediates can be directly observed; and as a bonus, it is an enzyme whose chemical pathway was known but nothing about what groups were involved in catalysis.The first step in the aminoacylation of tRNA is the nucleophilic attack of the carboxylate of the amino acid on the α-phosphate of ATP to generate an enzyme-bound aminoacyl-adenylate, which subsequently transfers the tyrosine to its cognate tRNA (9).Tyrosyl-adenylate is highly reactive in solution but is sequestered and stable in the complex with the enzyme in the absence of tRNA.The crystal structure of the complex reveals a large number of protein side chains binding the intermediate, principally by making hydrogen bonds.
The strategy for structure-activity studies of transition states of proteins The fundamental strategy for structure-activity studies is simple and taken straight from classical chemistry: make small rational changes in structure and measure the changes in the equilibrium free energies and activation free energies of the chemical steps.
Here, the steps are: (1) truncate the side chains that are hydrogen bond donors or acceptors with the substrate to give quantitative information on the effective strengths and to provide the ΔΔG 0 terms for the application of LFERs; and (2) do kinetics on mutants to measure the corresponding ΔΔG ‡ values.Step 1 is useful in general per se as it provides empirical quantitative data on biological interactions.The same strategy is applied analogously to other processes such as protein folding.
The first experiments measured the strengths of hydrogen bonds using Eq. ( 8) and the ratios of k cat /K M from steady-state kinetics for wild-type and mutants.The apparent energies spanned 0.5-1.5 kcal/mol (Fersht et al., 1985).I usually refer to these as apparent binding energies because they measure the relative binding energies that are found in practice but not absolute energiesall binding reactions in water represent an exchange reaction with H 2 O of solvation (Fersht et al., 1985).In general, energies from mutagenesis experiments have complex components, which I have emphasised from the start, but sometimes overlooked (Fersht, 1987(Fersht, , 1988)).

LFER analysis uncovers a novel enzyme mechanism just involving binding energy
The second step of the strategy was to determine ΔΔG ‡ and ΔΔG 0 for individual steps in Eq. ( 9) using rapid reaction pre-steady state kinetics (Wells and Fersht, 1985).There is a progressive increase in the apparent binding energy of the hydrogen bonds, as illustrated in Figure 4 where Cys-35 and His-48 are truncated to Gly, and the energies of the mutant compared with wild-type plotted.These progressive curves were described in terms of difference energies (Wells and Fersht, 1986).Subsequently, the ratio of ΔΔG ‡ =ΔΔG 0 was used and called a β-value, in homage to Brønsted (Fersht et al., 1987).This is effectively a series of two-point LFERs around the substrate for each interaction from a side chain.As seen in Figure 4, mutation of side chains that bind the sugar ring of ATP hardly weakens the binding of ATP in the E.Tyr.ATP complex but develops in the E.[Tyr-ATP] ‡ transition state (Leatherbarrow and Fersht, 1987).And, there is a further twist on this.The tyrosyladenylate is a high-energy compound, as well as being highly reactive, and the equilibrium constant for its formation from enzyme-bound tyrosine and ATP would normally be very low.But, the side chains bind the adenylate tightest of all, and so displace the equilibrium to stabilise its formation as well as sequester it from solution (Wells and Fersht, 1989).
Interestingly, the individual values of ΔΔG ‡ and ΔΔG 0 for the different mutations that bind the ribose of ATP could be combined to give sets of multi-point LFERs with β-value slopes, Figure 5 (Fersht et al., 1986(Fersht et al., , 1987)).These linear plots are not generally found in mutagenesis experiments as conformational changes are usually inhomogeneous, and so comparison of two-point plots and local clustering is the mainstay of the approach.The finding of subsets of LFERs in the sets of two-point measurements is a bonus here and in folding (Fersht and Sato, 2004).The presence of a multipoint localised LFER for the residues that bind the sugar ring shows the enzyme generates a local pressure on the substrate to form the transition state, which validates Haldane's: 'Using Fischer's lock and key simile, the key does not fit the lock perfectly, but exercises a certain strain on it' (Haldane, 1930).The most dramatic mutational site, located by model building, has residues that barely affect the binding of the substrate or tyrosyl-adenylate product but just greatly stabilise charges developed on the α-phosphate in the transition state with β >> 1, Figure 6 (Leatherbarrow et al., 1985;Fersht, 1987), more consistent with Pauling's general idea of transition state stabilisation (Pauling, 1948).
There are no chemical groups on the enzyme directly involved in catalysis.The carboxylate of the substrate tyrosine is a competent nucleophile and it appears that the mechanism of catalysis is the utilisation of binding energy to stabilise the transition state and displace an unfavourable equilibrium.By good fortune, the first application of protein engineering to study noncovalent interactions in enzyme catalysis discovered the first example of a natural enzymatic reaction being catalysed purely by transition state stabilisation without any of the classical mechanisms of chemical catalysis.

Basis for Φ-analysis for folding studies
Our 1987 paper provided the template for the analysis and choice of mutations for the analysis of folding pathways (Fersht et al., 1987).In it, we introduced two-point βs for individual mutations from ratios of  9)) on mutation of residues Cys35 and His48 (data from Wells and Fersht, 1986;Fersht et al., 1987).9)) (Fersht et al., 1987).Figure 6.Difference energy diagrams for residues in the binding site of the tyrosyl-tRNA synthetase that bind to the charged oxygens of α-phosphate of ATP primarily in the pentacovalent transition state on the nucleophilic attack of the carboxylate of tyrosine (Fersht, 1987).
Quarterly Reviews of Biophysics ΔΔG ‡ =ΔΔG 0 in the difference energy plots, and elaborated on the possible groupings of them together to give true multipoint LFERs.We classified the mutations into six categories for choosing them: Nondisruptive Deletion, 'a side chain is replaced by another that lacks a group involved in a specific interaction'; Disruptive Deletion, 'replacement of a side chain may lead to a perturbation elsewhere in the structure'; Conservative Substitution, 'a side chain is replaced by one that can substitute in the same interactions'; Semiconservative Substitution, 'some of the function is conserved on replacement'; Disruptive Substitution, 'substitution of a large size chain for a small one in a buried close packed region of a protein'; and Nondisruptive addition, 'bulky groups may be added to the surface of proteins without necessarily causing perturbation of structure'.We documented the caveats about the effects of reorganisation of structure and effects of changes in solvation obscuring the analysis, which I discussed in more detail (Fersht, 1988).The protein-engineering β methodology that was developed for studying binding and catalysis was directly transferable to the problem of protein folding.
Naming the ratio ΔΔG ‡ =ΔΔG 0 as β, though well-intentioned, was misleading as the interpretation of protein engineering values differs in crucial ways from the Brønsted β of covalent chemistry because of the effects of mutation on denatured states among other details.β was renamed Φ in its first application to protein folding (Matouschek et al., 1989) as Φ is not strictly a linear free energy quantity but approximates to one in certain circumstances.To avoid confusion, β is now reserved for the classical β of covalent catalysis and Φ for its counterpart in protein engineering (sections 'From βto Φ-value analysis' and 'Differences between βand Φ-value analysis' below).

The protein folding problem
The 'protein folding problem' consists of three closely related puzzles: (a) What is the folding code?(b) What is the folding mechanism?(c) Can we predict the native structure of a protein from its amino acid sequence?(Dill et al., 2008).Part (c), prediction of the three-dimensional structure of a protein from its linear amino acid sequence, goes back to Anfinsen (1973); and (b) the determination of the pathway to the folded structure from the unfolded to Levinthal (1968).The 'code' is how the information to fold is distributed along the structure.There is now a huge database of experimentally determined three-dimensional structures that has been the basis of very successful machine learning procedures for structure prediction, as embodied in AlphaFold (Jumper et al., 2021).However, it is a black box that does not reveal the code or the pathway (Ooka and Arai, 2023).Determination experimentally of the pathway of folding of a protein is extremely difficult because a polypeptide chain progresses through a multitude of transient states as noncovalent interactions are formed and rearranged, and they are not amenable to direct experimental study.
The 'Levinthal Paradox' was that proteins could not fold in finite time in a random search.(See an interesting aside from Baldwin who was present at its initial presentation (Baldwin, 2017).)To solve this paradox, Wetlaufer proposed that one solution for the kinetics of folding was a nucleation-growth mechanism where a small local element of secondary structure slowly formed a nucleus and the structure rapidly grew around it (Wetlaufer, 1973).Ptitsyn proposed a framework (Ptitsyn, 1973) or diffusion-collision mechanism (Karplus and Weaver, 1976), whereby a framework of elements of secondary structure formed an intermediate rapidly in which they diffused and collided to dock on each other.Another proposal was hydrophobic collapse where non-specific tertiary interactions are rapidly made to form a molten-globule, which rearranges to give the final folded structure (Ptitsyn, 1991; Figure 7).Simple theoretical models, usually based on simulations on lattices, showed that the paradox arose because the original assumption was for an unbiased search for the folded state on a flat energy surface.In contrast, mechanisms utilising the gradual or otherwise acquisition of native interactions funnelling folding to the desired state obviated the paradox (Sali et al., 1994a;Bryngelson et al., 1995;Dill et al., 1995;Onuchic et al., 1995;Karplus, 2011;Takada, 2019;Finkelstein et al., 2022).There was, however, an apparent conflict between the 'classical view' of protein folding proceeding along defined pathways with intermediates and a supposed 'new' view of folding on an energy landscape (Baldwin, 1995).From these theoretical studies, we now envisage proteins folding on multi-dimensional energy landscapes with a large number of conformations in the denatured state ensemble with high entropy converging on decreasingly smaller ensembles in transition states and intermediates to the final structure, with the gain in enthalpy from native interactions compensating the loss of entropy.We can represent these ensembles as states along a twodimensional energy diagram, Figure 8 (Eaton et al., 1996).It must be emphasised that what the experimentalist sees as the denatured state, D, under conditions that favour folding, D phys , is not usually a random coil, U, but a more structured state varying from having flickering interactions (Figure 8a) to a fairly structured on-or offpathway intermediate (Figure 8b).The basics of protein folding studies are discussed in more detail in Fersht (1999, 2017, 2018, Chaps. 17-19).
Nucleation mechanisms went out of favour because the early experimental examples of protein folding were found to proceed via intermediates on the pathway (Ptitsyn, 1987;Kim and Baldwin, 1990), and nucleation is characterised by not having intermediates that would accumulate.

From βto Φ-value analysis
Studies on the effects of point mutations on folding kinetics had begun in the late 1980s with Matthews analysing natural mutants of the α-subunit of tryptophan synthase (Matthews, 1987).Goldenberg protein engineered mutants of bovine pancreatic trypsin inhibitor (Goldenberg et al., 1989).We began applying the technology and Φ-strategy developed on the tyrosyl-tRNA synthetase to the folding of a small RNase, Barnase (Kellis et al., 1988;Sali et al., 1988; section 'Barnase: the test bed').The two-point LFER approach used for the mapping the progress of noncovalent interactions in enzyme catalysis is directly applicable to studying transition states and transient intermediates in folding.But, there are crucial refinements, which were laid out in the initial LFER paper (Matouschek et al., 1989 and subsequently expanded in more depth (Fersht et al., 1992;Fersht and Sato, 2004), relying on the thermodynamic cycles in Figure 9, which are essential to the analysis (the use of such alchemical cycles was perhaps not obvious and queried at the time (Buchner and Kiefhaber, 1990).Accordingly we used the same strategy as before: (1) make chemically sensible mutations in a suitable protein by truncating side chains to remove stabilising interactions (avoid mutations that cause stereochemical clashes or unstable charges within the proteinthe nondisruptive deletions, especially of hydrophobic side chains); (2) measure the change in the free energy of folding of the protein on mutation, ΔΔG NÀD ( = ΔG N 0 ÀD 0 À ΔG NÀD , where N = native state, D denatured state, and N' and D' refer to mutants); and (3) measure the rate constants of folding, k f , of the wild-type and mutant proteins to determine the changes in the free energies of activation ΔΔG ‡ÀD ð= ΔG ‡ 0 ÀD 0 À ΔG ‡ÀD = ÀRTlnðk 0 f =k f ÞÞ , and rate constants for unfolding, k u , to give  Eaton et al., 1996).Q is the relative number of pairwise native contacts in the landscape description and r is the conventional overall reaction coordinate.The number and heterogeneity of individual states decreases as the protein folds.(A, cross-section through a folding funnel (courtesy of P.G.Wolynes); B, reducing the landscape to a collection of ensembles moving along a pathway for the folding of a two-state protein such as CI2; and C, folding of a protein with a more structured denatured state.

Quarterly Reviews of Biophysics
We then defined a parameter Φ for folding.In the direction of folding: And for unfolding: We can derive from the thermodynamic cycles in Figure 9 that ΔΔG DÀN = ΔG D 0 ÀD À ΔG N 0 ÀN ; ΔΔG ‡ÀN = ΔG ‡ 0 À ‡ À ΔG N 0 ÀN ; and ΔΔG ‡ÀD = ΔG ‡ 0 À ‡ À ΔG D 0 ÀD .Accordingly, Ignoring the changes in covalent energies on mutation as they cancel out in subsequent calculations, the term where ΔG N 0 ÀN ð Þ noncovalent is the change in noncovalent interactions from the mutation and ΔG N 0 ÀN ð Þ reorg is any energetics of reorganisation of the structure of the folded protein.There are similar equations involving ΔG reorg for the change in energetics of the denatured and transition states including changes in solvation, ΔG solv .For denatured states that are highly unfolded, ΔG solv is the major term in ΔG reorg but often for the interior in folded proteins ΔG solv = 0.
Building on our classification of mutations (Fersht et al., 1987) and thermodynamic analysis (Fersht, 1988), it was spelled out clearly in the first paper what type of mutations to make in the light of incursion of ΔG solv , and how the choice affects the observed values of Φ (Matouschek et al., 1989.Assuming that the effects of mutation on the noncovalent interactions are localised to the site of the side chain, the two extreme situations are readily interpretable (Figure 10).If the side chain is as unstructured in the transition state as in the denatured state, ΔG ‡ 0 À ‡ = ΔG D 0 ÀD and so Φ F = 0and Φ U = 1.Conversely, if the side chain is as structured in the transition state as in the native state, ΔG ‡ 0 À ‡ = ΔG N 0 ÀN , and so Φ F = 1 and Φ U = 0.This is the same as the extreme cases of the Brønsted β.For mutations of larger to smaller aliphatic side chains, which are the most suitable as we cannot emphasise enough, ΔG D 0 ÀD (i.e.ΔG reorg ) should be small.For example, mutation of Ile!Ala and Ile!Val have ΔG solv = À0.21 and À0.16 kcal/mol, respectively.The deletion of a ÀCH 2 À group will lead to minimal G reorg .Accordingly, Φ F is related to the extent of local structure formation in the native and transition states (Matouschek et al., 1989;Fersht et al., 1992;Fersht and Sato, 2004).This is especially so for Ala!Gly scanning in helices (section 'Ala!Gly scanning of secondary structure').

Differences between βand Φ-value analysis
In many ways, the interpretation of Φ-values is analogous to that of β, but there are important differences that must be minimised for the successful application of Φ.In the classical chemical LFERs, the structural changes made in the reagents are at positions separated from the reacting bonds and the effects of the substituents transmitted through the molecule.ΔG reorg terms for β in covalent chemistry are ignored because they are relatively small or nonexistent.Basically, β (or α) = ∂ΔG ‡ÀS =∂ΔG PÀS in Figure 3.In the protein engineering LFERs, the very groups making the bonds are changed and there can be a significant ΔG reorg in the native state and possibly in a structured denatured state.There can also be ΔG solv terms for both states.To acknowledge these differences, as mentioned previously, β was renamed Φ, and Φ -analysis experiments designed to minimise or accommodate those ΔG terms (Matouschek et al., 1989).When this is done, Φ is very similar to β.
(Water molecules surrounding the reactants and catalyst in classical chemical LFER experiments may rearrange on changing a substituent and cause significant changes of ΔH ‡ reorg and -TΔS ‡ reorg but those changes tend to compensate and cancel out in ΔΔG, although they do complicate attempts to measure the ΔH and ΔS components of the actual chemical steps.) REFERs: β Tanford (β T ), Leffler/Brønsted plots and Φ Protein folding has other important differences such as the difficulty in choosing a suitable reaction coordinate.A global average may be defined for overall folding but the formation of structure is not homogeneous and the local reaction coordinates for substructures are what define the formation of transition states and intermediates.The interpretation of Φ-values is more complicated than that of β and extra procedures may be involved.A simple overall reaction coordinate was introduced by Tanford (1968Tanford ( , 1970)).All parts of a protein are stabilised by denaturant, Den, and its free energy increases linearly with [Den] and the solvent accessible surface area (SASA).There is a decrease in SASA on going from D ! ‡ !N, so where for 2-state kinetics m NÀD = m ‡ÀD + m ‡ÀN (all m-values +ve).The relative change in surface area in the transition state, which I renamed β T in homage to Tanford (Matouschek et al., 1995), is given by: β T = m ‡ÀD =m DÀN .The Tanford plot is a true REFER.
Leffler plots, which are also called Brønsted plots, of ΔG ‡ÀN versus ΔG DÀN or ΔG ‡ÀD versus ΔG NÀD also give an indication of the overall change in energetics.However, they can exhibit scatter depending on the inhomogeneity of structure formation in the transition state (see later in the discussion of Figure 15, section 'Chymotrypsin inhibitor 2: computer simulations').Just as the finding of multipoint LFERs/REFERs for the tyrosyl-tRNA synthetase is a bonus, resulting from concerted movement of parts of the binding site relative to the substrate, the same can sometime be found for Φ-analysis.Part of a helix in barnase, for example, is uniformly present in the transition state and its formation can be Figure 9. Thermodynamic cycles for the basis of Φ-value analysis (relabelled from Matouschek et al., 1989).
benignly probed by truncating surface exposed side chains to Ala and then Gly to give a series of overlapping 3-point Leffler/Brønsted plots (Matthews and Fersht, 1995;Fersht and Sato, 2004).Accordingly, Φ is a true REFER for those mutations.

ψ-value analysis
Disulphide crosslinks tie together residues in both the transition states and denatured states as well as native states, with predictable effects on kinetics that can detect when the linked elements of structure are formed during the folding pathway of wild-type protein (Clarke and Fersht, 1993).This is a highly specific procedure and very limited in applicability.Sosnick has pioneered a more general mutational procedure for this crosslinking approach for surface residues, ψ-value analysis (Krantz and Sosnick, 2001;Baxa and Sosnick, 2022).Pairs of histidine residues as metal-binding sites are introduced on the surface typically close to each other in the folded state, for example, at positions i, i+4 in an α-helix or at neighbouring strands in a βsheet ('nondisruptive additions').A metal ion can then crosslink the pair.This contrasts with Φ -value analysis in that ψadds new interactions to the protein and analyses their effects on the mutants whereas Φ -analysis uses non-disruptive deletions that probe the extent of formation of interactions present in the wild-type structure.ψ-value analysis is not an REFER but the values of 1 or 0 should be interpretable (Fersht, 2004a;Bodenreider and Kiefhaber, 2005).Indeed, simulation of the transition state for the folding of ubiquitin is consistent with ψ-values of 1 or 0 but not the fractional ones.It is a useful tool for those values (Varnai et al., 2008).

Weak, medium, and strong categorisation of Φ
The values of Φ = 0 or 1 may be interpreted with confidence.Mutations such as Ile!Val, Ala!Gly, and Thr!Ser are particularly suitable and Ile!Ala can be goodsee section 'Experimental approach to Φ-value analysis'.In general, the Φ-values should be interpreted only semi-quantitatively and with caution: 0 < Φ F < 0:2, 'low' or 'weak', little or no structure in transition state; 0:3 < Φ F < 0:6 , 'medium' significant to strong; and 0:7 < Φ F < 1, 'high' or 'strong', very significant structure (with flexibility as to the boundaries)like weak, medium and strong NOEs used as distance constraints in molecular dynamics (MD) calculations in structure determination by NMR (Fersht and Sato, 2004;Garcia-Mira et al., 2004) and such classification has been applied with success in computer simulations of the structure of transition states (Geierhaas et al., 2008).As discussed later, Φ -values may be powerfully combined with computer simulations of unfolding and folding trajectories to give true atomic-level descriptions of protein folding pathways.It is important to make many mutations and over-sample to find consistent results that then give reliable information.Φ -values by themselves can give gross and near atomic resolution details on the structures of transition states.There are some areas that are more problematic, which I next describe and how they may be resolved.
Φ and non-native interactions: Φ < 0 or Φ > 1 Φ, like β, is predicated on a single bond or set of bonds being formed, with limits of 0 for no formation and 1 for complete.It parallels in some ways the Gō model in simulation that assumes that only native contacts are involved in the folding process and they consolidate (Taketomi et al., 1975;Takada, 2019).If there are non-native interactions in transition states or intermediates, then unnatural values of Φ of <0 or >1 may be observed, and they are a useful signal for that.Residual structure in the denatured state can give rise to non-classical values (Cho and Raleigh, 2006).Small two-state single-domain proteins are the most likely not to involve non-native interactions (Best and Hummer, 2016)

Double-mutant cycles to identify native partners in interactions
Φ-value analysis interprets changes in energy to changes in structure and assumes that the native interactions are involved, and there can be complications from non-native interactions.Strong evidence about which residues interact can found by the procedure of doublemutant cycles (Figure 11), first introduced for the tyrosyl-tRNA synthetase (Carter et al., 1984).Two residues that interact in the native state of the protein are mutated individually, and then pairwise.An interaction energy between just those two residues ΔΔG int is measured without complications from an unfolded denatured state (Fersht et al., 1992).The same is true for the interaction in the transition state, ΔΔG ‡ int .Values of Φ int , = ΔΔG ‡ int =ΔΔG int , show with high certainty whether or not and by how much those interactions are formed in the transition state (Horovitz andFersht, 1990, 1992;Horovitz et al., 1991;Fersht et al., 1992;Pagano et al., 2021).They can be used to provide constraints for computer simulations of transition state structure (Salvatella et al., 2005).Multi-mutant cycles can also be performed (Horovitz andFersht, 1990, 1992).

Parallel pathways and fractional Φ-values
A fractional Φ-value is usually interpreted as arising from a single transition state ensemble that has weakened interactions.But there could be parallel pathways, as in, Figure 12, with some having full structure at the point of mutation and others disordered and these could give an apparent fractional value (Baldwin, 1994;Sali et al., 1994b).This can be tested, however, by making a series of additional mutants that would have different and predictable effects on the disordered and structured pathway states, and the fractional values of Φ for the protein CI2 (below) being consistent with a single pathway through the transition state (Fersht et al., 1994).

Residual structure in denatured states
Denatured states can have residual structure even at high concentrations of denaturants (Dill and Shortle, 1991;Cho and Raleigh, 2006) and especially at low concentrations where the most stable denatured state may be a folding intermediate or an off-pathway state with non-native interactions.Residual structure is melted out less slowly by denaturants and temperature as there are smaller changes in surface area.These states can severely affect folding kinetics of all types.But, unfolding kinetics and ΔG ‡ÀN from the folded state are unaffected as the denatured states are after the ratedetermining transition state.ΔG DÀN is measured at higher concentrations of denaturant but there could be significant ΔG DÀN ð Þ reorg terms with mutations affecting structure in the denatured state.Values of Φ U close to 0 will be relatively unaffected but values closer to 1 may have artefacts.For these reasons, we gave up the terminology 'U' = unfolded for the denatured state and call it D or D phys under physiological conditions.
Experimental approach to Φ-value analysis ΔG reorg and choice of mutation The presence of ΔG N 0 ÀN ð Þ reorg and similar terms dictates the choice of mutation.To recapitulate earlier points, a mutation of a buried side chain to a larger one will likely cause a significant ΔG N 0 ÀN ð Þ reorg as will changes in buried charges.Accordingly, mutations that preferably delete interactions, non-disruptive deletions, or are  isosteric are most suitable.The changes in energetics must be sufficiently large to be able to be measured accurately but not too large, otherwise the position of the transition state may be perturbed or there will be a local rearrangement of structure on making a too-large deletion.Our preferred strategy is: (1) to mutate the buried hydrophobic moieties Ile!Val!Ala!Gly; Leu!Ala!Gly; Thr!Ser; and Phe!Ala!Gly.Deletion of a ÀCH 2 À has minimal effects on the solvation energies of the denatured state and low ΔG reorg in all states; (2) make a wider range of surface mutations; (3) mutate Ala!Gly positions in secondary structural regions ('Ala!Gly scanning', see 7.2), especially in α-helices, because they provide an exquisite probe of secondary structure in the helix since mutation perturbs mainly intra-helical interactions; and (4) use, sparingly, double-mutant cycles in which changes in solvation and reorganisation energies tend to cancel out.Mutation of a long aliphatic side chain in the hydrophobic core, such as that of isoleucine, can give information on the degree of consolidation of the core on mutation to Ala, and then the structure of the helix during that process on subsequent mutation to Gly.Successive deletion of different parts of larger side chains may give multiple probes of structure (Serrano et al., 1992b).These types of mutation tend to give values of ΔΔG DÀN in the range of 0.6-2 kcal/mol, which can be measured with adequate precision and typical of the interactions that report on secondary structure as well as local interactions in hydrophobic cores (Friel et al., 2003;Fersht and Sato, 2004;Garcia-Mira et al., 2004;Sato et al., 2006).Larger changes can lead to a movement of the transition state on the energy landscape (Fersht and Sato, 2004).

Ala!Gly scanning of secondary structure
Mutation of Ala!Gly in helices is a particularly clean tool (Matthews and Fersht, 1995).The CH 3 À side chain of Ala stabilises an α-helix relative to the HÀ of Gly mainly by burial of the hydrophobic surface area, from 0.4 to 2 kcal/mol, and mutation has minimal structural perturbation (Serrano et al., 1992a,c).Further, unfolded alanine-and glycine-containing peptides are approximately isoenergetic in noncovalent interactions (Scott et al., 2007) and so mutation of Ala!Gly has minimal ΔG reorg terms in both states.Accordingly, Φ Ala!Gly is the most reliable measure of structure formation of all Φ-values.

Experimental determination of ΔGs
The changes in ΔG ‡ and ΔG DÀN are mostly measured from variation of the rate constants of folding and unfolding and the equilibrium constant with concentration of a denaturant such as urea or guanidinium chloride.Usually, logarithms of the rate and equilibrium constants for unfolding increase linearly with concentrations of denaturant under the accessible experimental conditions, but sometimes with small deviations at very low concentrations (Tanford, 1968(Tanford, , 1970)).For two-state kinetics, the logarithm of the rate constants for folding decrease linearly with denaturant concentration (Tanford et al., 1973) and plots of the combinations of logk u and logk f give so-called chevron plots as in Figure 13 (Jackson and Fersht, 1991a.For multi-state systems, the refolding limb is usually characterised by 'rollover' where the folding rate constant tends to plateau at low denaturant concentration as there are changes in rate-determining steps, Figure 13, inset (Matouschek et al., 1989.The proteins in Figure 13 refold on the tens of ms time scale, the kinetics measured by rapid-mixing stopped-flow methods.Smaller single-domain proteins can fold even faster on the μs time scale as for the 37-residue Formin-Binding Protein, FBP28, a canonical three-stranded β-sheet WW domain, Figure 14 (Petrovich et al., 2006).Its kinetics of folding and unfolding are too fast for rapid mixing but are readily and accurately measured using temperature-jump apparatus.The unfolding of such small proteins exposes only a relatively small amount of buried surface area and so the transition is spread out over a wide range of concentration of denaturant.The FBP28 domain has a very polarised transition state as readily seen directly from the chevron plots.Some plots have the folding limbs nearly superposed, showing ΔG ‡ÀD $ 0 and so Φ F $ 0=ΔΔG NÀD , that is ~0 for non-zero values of ΔΔG NÀD .Conversely, other plots have the unfolding limbs nearly superposed, showing ΔG ‡ÀN $ 0 and so Φ U $ 0=ΔΔG DÀN , that is ~0.As, Φ U + Φ F = 1 for two-state kinetics, these chevrons of Φ U $ 0 have Φ F $ 1.These values of Φ F $ 0 or 1 are also determined with the highest confidence as the errors around ΔΔG ‡ÀN and ΔΔG ‡ÀN $ 0 are small.An error of, say, ±0.1 for a mean of Φ U = 0:05 is a very high percentage error in the absolute value of Φ U but in the context of where Φ U is on the scale of 0 to 1 is sufficiently accurate for the purposes of interpretation.Accordingly, the most readily interpretable values of Φ, 0 and 1, are the ones most amenable to confident measurement.
I advocate for optimising precision measuring differences in ΔG ‡ and ΔG DÀN directly under the same reaction conditions (same concentration of denaturant, [Den]) and not extrapolating to the absence of denaturant.In our laboratory, we can measure ΔΔG DÀN with adequate precision down to ~0.6 kcal/mol from the differences in the midpoints of equilibrium denaturation curves of wild-type and mutants (Clarke and Fersht, 1993) or from the unfolding and folding rate constants (Fersht and Sato, 2004) as do other (Friel et al., 2003;Garcia-Mira et al., 2004).First-order rate constants for unfolding and refolding can be determined with high precision.Attention to detail is important.We make up stock solutions of denaturant for each concentration, using volumetric flasks rather than diluting one concentrated stock solution into buffer.I avoid using phosphate buffer with guanidinium chloride as it lowers the  (Jackson and Fersht, 1991a) and, inset, barnase (Matouschek et al., 1990).Rate constants are in units of s À1 .For CI2, the plot is for a perfect two-state transition and the arms are linear.For barnase, there is deviation at low denaturant concentration from the perfect theoretical two-state (solid line) because of a change in the structure of the denatured state or presence of a folding intermediate.
Quarterly Reviews of Biophysics pK a greatly with increasing [Den] because its ionic component displaces the ionisation equilibrium H 2 PO 4 À = HPO 4 2À + H + as according to the Debye-Huckel equation the activity coefficient of an ion depends on the charge squared (Debye and Huckel, 1923).(The application to kinetics was implemented in the Brønsted-Bjerrum equation.).Instead, I prefer an amine buffer at neutrality or at lower pH acetate because their ionizations parallel more closely the principal protein ionizations at those pHs; histidine/αamino groups, and aspartate/glutamate (Fersht and Petrovich, 2013).Urea does not have this problem.To minimise problems from changes of pH with temperature and denaturant concentrations and so forth, measurements are best made at pHs where free energies and kinetics are pH independent.

Combining Φ-values with and benchmarking computer simulation
The complete conscription of folding pathways of proteins can be achieved only by computer simulation.This is possible de novo only when the energy potentials are sufficiently reliable, or a black box machine learning is applicable.The role of the experimentalist has been to provide the structures of all the states along the pathway as a starting basis for simulation and to benchmark simulation within the limitations of current energy functions.Φ-values since their initial introduction have provided the crucial benchmark for interactions in the transition state for the folding of the small domains, the most easily studied computationally because of the limitations on computing power.They are being used for testing more complex folding of large proteins (Ooka and Arai, 2023).There are methods for calculating Φ-values directly (Best and Hummer, 2016).
Barnase: the test bed Φ-value analysis was pioneered on the 110-residue RNase, Barnase, from Bacillus amyloliquefaciens.It is a most suitable small protein for structure-activity studies using protein engineering, readily expressed from E. coli and does not have complications from disulphide bridges or cis-prolines in the folded state.The strategy for studying it has two steps as for the tyrosyl-tRNA synthetase studies; (1) mutate the protein sensibly and extensively to build up a library of the common interactions that stabilise proteins; and (2) select suitable mutants for kinetic analysis.
Step 1: library of interaction energies that stabilise proteins The magnitudes of the hydrophobic effect and other interactions were usually measured from simple free energies of transfer from organic solvents to water (Fersht, 1999, 2017, 2018 Ch. 11) or more appropriately for α-helixes the stabilities of synthetic peptides in water (Padmanabhan et al., 1990).We made the first systematic measurements of the common interactions that stabilise proteins directly in a protein from the values of ΔG DÀN of wild-type barnase versus mutants whose side chains had been truncated by non-disruptive deletions.The deletion of ÀCH 2 À group from a residue in the hydrophobic core lowers stability by up to 1.6 kcal/mol compared with 0.68 kcal/mol in the simple chemical models (Kellis et al., 1988(Kellis et al., , 1989)).The mutation of Ala!Gly in the exposed surface of helices lowers stability of 0.4-2 kcal/mol and depends on the amount of surface area of the CH 3 À group of Ala buried (Serrano et al., 1992a(Serrano et al., , 1992b)).Mutants from these studies with suitable values of ΔG DÀN were chosen for the kinetic studies.
Step 2: kinetics The initial study was on the unfolding of the protein as it starts from the best-characterised state on the pathway and the folding direction can be beset by problems of residual structure in the denatured state or even intermediates (Matouschek et al., 1989.Unfolding kinetics provides in general the most reliable data and is very relevant to biology because many diseases are initiated by protein unfolding.The folded state is the best-characterised starting point also for computer simulation.The unfolding transition state for folding is generally the highest energy state on the folding pathway. Barnase is a multimodular protein, having regions that make more interactions within themselves than with the rest of the protein, with three hydrophobic cores and a mixed α + β architecture.Some of the regions have Φ-values near 1, others have values of 0, and some regions are intermediate.The centre of the sheet and the C-terminal portion of helix 1 have Φ-values of approximately 1.There are fractional Φ-values for the edges of the sheet and for the packing of the N-terminal α-helix on the β-sheet, which constitutes the major hydrophobic core.The second domain, containing helix2, and the loops have Φ-values ~0.The multimodular barnase has a polarised major transition state, which occurs late on the reaction pathway with much of the secondary structure being formed and the hydrophobic core between the major α-helix and β-sheet in the process of being consolidated (Matouschek et al., 1989;Serrano et al., 1992a).

Folding intermediate or structured D phys ?
The downward curvature in the refolding limb of the logk obs versus [Urea] plot (Figure 13) was the initial evidence that there is either a folding intermediate or structured denatured state, D phys , whose concentration or properties change with concentration of denaturant (Matouschek et al., 1990.A structured D phys that progressively unfolds in a non-cooperative transition could give rise to a variable two-state process.Φ-values probe the structure of this state (Matouschek et al., 1992), which has been extensively studied by a variety of methods (Khan et al., 2003) and simulation (Caflisch and Karplus, 1995;Li and Daggett, 1998;Wong et al., 2000;Galano-Frutos and Sancho, 2019).The biophysics is consistent with a cooperative unfolding of the state (Dalby et al., 1998a(Dalby et al., , 1998b)).There are probably two intermediates on the pathway (Khan et al., 2003;Sanchez and Kiefhaber, 2003).Φ F -values measured from ill-defined folding intermediates must be interpreted with caution because there may be non-native interactions involved.Time-resolved small-angle X-ray scattering indicates an expanded state (Konuma et al., 2011).The evidence is consistent with some fraction of the denatured ensemble containing residual, non-random structure, especially in helix 1 and the turn (β3-β4) in the centre of the sheet consistent with MD simulation of the denatured state (Bond et al., 1997;Wong et al., 2000).The folding pathway is simulated atomistically by running the unfolding pathway in reverse, Figure 15 (Fersht and Daggett, 2002;Daggett and Fersht, 2003).

Chymotrypsin inhibitor 2: two-state kinetics and nucleation-condensation
Our second protein studied, Chymotrypsin Inhibitor (CI2), is a 64-residue single-domain protein, unlike most of the previous proteins then studied which were multi-domain.It has a single α-helix, docked onto β-sheet, a single-module protein.In contrast to those other proteins then studied (Ptitsyn, 1987;Kim and Baldwin, 1990), CI2 was found to fold by two-state kinetics without an intermediate and, for that time, relatively fast on the 10 ms time scale (Jackson andFersht, 1991a, 1991b).Intermediates do not detectably accumulate in its folding and the ratio of rate constants for folding and unfolding give the correct equilibrium constant for denaturation, again unlike for the previously studied proteins.The chevron plot has perfectly linear arms, Figure 13.Its single rate-determining transition state for folding can be studied in both directions to show unfolding and folding are the reverse pathways of each and so microscopic reversibility is obeyed.More examples of two-state folding were quickly found (Jackson, 1998) and 89 proteins are now reported with two-state folding kinetics (Manavalan et al., 2019).The small single-domain proteins are very suitable for gaining insights into the early stages of folding before their assembly into more complex tertiary structures in larger multi-domain proteins.They often fold and unfold sufficiently fast that their denatured and native states are in rapid equilibrium in vivo and so the in vitro studies are also directly relevant to biology.There could of course be high-energy intermediates, such as in Figure 1, which are cryptic.Two-state folding The structures are coloured from red at the N-terminus to blue at the C-terminus.The denatured state is an ensemble of structures whose overall topology resembles that of the native state.Τhe hairpin at the centre of the antiparallel β-sheet is present in the denatured state, albeit with some non-native interactions.The N-terminal helix is partly structured, stabilised by hydrophobic interactions.The final transition state consists of the largely formed N-terminal helix docked onto the β-sheet, which is strongly formed in the central regions, with the hydrophobic core in the process of being formed and other interactions consolidated (Fersht and Daggett, 2002).
Quarterly Reviews of Biophysics without accumulating intermediates resurrected the possibility of nucleation mechanisms.

Chymotrypsin inhibitor 2: nucleation-condensation mechanism
We always perform a large number of mutations, but the Φ-value analysis of CI2 was exhaustive: 100 mutations at 45 of the 64 residues and a network of 11 double-mutant cycles (Itzhaki et al., 1995).It revealed not only nucleation but discovered a new mechanism: the nucleation-condensation mechanism (Fersht, 1995;Itzhaki et al., 1995).The single observed transition state for folding and unfolding consists of a structure in which an extended nucleus is formed, built around the single α-helix, which is being formed at the same time as the rest of the structure is condensing around it.Apart from one residue, all the Φ-values are fractional, approaching closer to 0, the further away from the diffuse nucleus.The physical-chemistry reasoning behind this is quite simple.None of the elements of regular secondary structure, such as the α-helix, are stable in the absence of the rest of the protein structureas is generally found for proteinsand so those regions when separate from the rest of the structure are largely random in solution (Epand and Scheraga, 1968).For most proteins, the secondary structure needs to be stabilised by long-range interactions.Protein folding is, accordingly, such a cooperative process that the major transition state for folding of a domain is one in which the structure is largely formed.Nucleation-condensation is now a well-established general mechanism for the folding of single domains (Nolting and Agard, 2008;Kukic et al., 2017).
The important features of nucleation-condensation are not just that the nucleus is large and extended but its structure is like a distorted form of the native structure where interactions are not uniform but weaken away from the nucleus.A generally useful pointer to the nucleation-condensation mechanism or a diffuse transition state is a Leffler/Brønsted plot of ΔG ‡ÀN versus ΔG DÀN (Figure 16).As the Φ-values are mainly fractional, the plot is scattered around a linear regression of slope 0.7 with deviations for the higher and lower values of Φ .In contrast, the plot for barnase with its polarised transition state and Φ spread from 0 to 1 has the points scattered between lines of slope 0 and 1 (Itzhaki et al., 1995).

Chymotrypsin inhibitor 2: computer simulations
CI2 is such a well-behaved system, small, and with so much experimental Φ-value data available that it stimulated and became a major test bed for computer simulation.I have had a long collaboration, beginning in 1994 (Fersht et al., 1994;Li and Daggett, 1994), with Valerie Daggett, who had performed the first all-atom simulation of the unfolding of the bovine pancreatic trypsin inhibitor (Daggett and Levitt, 1992).Our collaboration agreement was that all her simulations were done blind without foreknowledge of our experimental data.Li and Daggett simulated the unfolding of CI2 at 498K, the simulated high temperature being necessary for the unfolding to be on the then accessible timescale of 2.2 ns (Li and Daggett, 1994) (the pathway does not change over a range of temperature (Day et al., 2002)).The Φ-values from MD and experiment were very similar in the first study.As more experimental Φ-values became available, the good agreement remained.A simulation (Daggett et al., 1996) gave a complete atomic-level description of the transition state and recapitulated all the experimental Φ-values (Itzhaki et al., 1995).These simulations were then combined with further studies on the denatured state, including one of the first atomic views of a 'random coil' denatured state (Kazmirski et al., 2001), and transition states (Li and Daggett, 1996;Kazmirski et al., 2001), to give more detailed descriptions, reviewed by Fersht and Daggett (2002) and Daggett and Fersht (2003), Figure 17.
In multiple simulations of unfolding, single trajectories are distributed around an average 'ensemble' path (Day and Daggett, 2005).Simulations of folding and unfolding at the melting temperature showed that microscopic reversibility indeed holds (Day and Daggett, 2007).Overall, they found conformations in the transition state ensemble (TSE) have a probability of 0.5 to refold to the native state, with approximately 50% of the structures taken from the TSE refolding and the other 50% progressing to the denatured state (Day and Daggett, 2007).Further, simulations pointed to mutations that could speed up folding by relieving strain in the transition state, and one, Arg38!Phe48, was found that speeds up folding 40x to a t 1/2 of 400 μs (Ladurner et al., 1998).Thus, the MD-derived TSE consists of true transition states, validating the use of transition state theory underlying all Φ-value analyses, and also showing the power of simulation.
The results of multiple simulations of unfolding reconciled the 'new view' of folding on an energy landscape and the classical view of protein folding with a defined pathwaythere is a statistically preferred pathway on a funnel-like average energy surface (Lazaridis and Karplus, 1997).The funnelled nature of the energy landscape arising from Wolynes' minimal frustration principle (strong native bias) is consistent with unusual Φ-values being infrequent and that the transition state is a distorted version of the native state.Also, because the energy landscape is funnelled mutations, are not prone to change the structure of the native state (Oliveberg and Wolynes, 2005).CI2 Φ-values helped the theoreticians to clarify their views (Pande et al., 1998).
CI2 occupies an important position in the development of protein folding studies because it was the first example of a singledomain protein showing two-state kinetics, the Φ-value analysis discovered the nucleation-condensation mechanism, and it stimulated so much theoretical advance.

Movement of TS on the energy landscape: Hammond and anti-Hammond effects
The transition state lies on a saddle point in the energy landscape and can move in a direction along the reaction coordinate, Hammond effect, or perpendicular to it, anti-Hammond, as the energetics are perturbed, Figures 3 and 18 (Jencks, 1985).We found both Hammond and anti-Hammond in folding transition states (Matouschek and Fersht, 1993;Matouschek et al., 1995;Matthews and Fersht, 1995;Dalby et al., 1998c) by comparing the extent of overall folding using Leffler/Brønsted plots of ΔG ‡ 0 versus ΔG DÀN or β T with Φ-values for local structure (Matthews and Fersht, 1995;Fersht and Sato, 2004).A Leffler/Brønsted plot of successive mutations in helix 1 of barnase has a slope for unfolding of À0.09 for mutations with ΔG DÀN < 2kcal/mol, showing that it is ~90% folded in the transition state, but for ΔG DÀN > 3 kcal/mol, the slope steepens to À0.6, so that the helix is only ~60% folded.The overall position of the transition state moves closer to that of the native structure as it becomes less stable, measured by β T , the Hammond effect, but the helix itself follows anti-Hammond behaviour and moves away from native.The anti-Hammond could result from a changing balance in parallel pathways (Matthews and Fersht, 1995) or true movement perpendicular.Simulation supports the latter (Daggett et al., 1998).Movement of the transition state on large destabilising mutations signals caution in interpreting changes in Φ for them.Importantly, it points to how a series of mutations in a family of homologous proteins can lead to changes of mechanism.
The protein folds from the intermediate via a framework mechanism.EnHD has a very stable helix 1 which is up to ~40-50% αhelical in the absence of the rest of the protein, and helices 2 and 3 together form a helix-turn-helix motif which is not only structured in that folding intermediate (Mayor et al., 2003a) but also stable as an independent sequence (Religa et al., 2007).This intermediate is the most stable denatured state under conditions that favour folding, the more unfolded form being less stable, and its structure has been determined by NMR (Religa et al., 2005).Φ-values show the final rate-determining transition state is the docking of helix 1 onto to the structure helixes 2 and 3 to form the hydrophobic core (Figure 19; Mayor et al., 2003a).
Three members of the homeodomain-like protein family that share the same overall topology with EnHD: human TRF1 Myb domain (hTRF1); human RAP1 Myb domain (hRAP1); and c-Myb-transforming protein (c-Myb) have decreasing propensity for α-helix formation in helix 1 (Figure 20) and helixes 2 and 3 do not form independently stable helix-turn-helix motifs.These proteins vary widely in sequence, just having fold homology.There is a spectrum of folding processes that spans the complete transition Quarterly Reviews of Biophysics from framework to nucleation-condensation mechanism as the helical propensity decreases, Figure 21 (Gianni et al., 2003).The common factor in their mechanisms is that the transition state for (un)folding is expanded and very native-like, with the proportion and degree of formation of secondary and tertiary interactions varying.It appears that framework and nucleation-condensation are different manifestations of an underlying common mechanism, Figure 21 (Daggett and Fersht, 2003;Gianni et al., 2003).

Folding close to the speed limit
Pit1, the 63-residue homeodomain from pituitary-specific transcription factor, folds via an intermediate in wider separated phases than EnHD of t 1/2 2.3 and 46 μs (Banachewicz et al., 2011), allowing Φ-values to be measured for both phases (Banachewicz et al., 2011).Its helix-turn-helix motif does not independently fold but is folded in the intermediate, docked to a misfolded helix 1, which rearranges to fold correctly.Pit1 is on the slide from framework in the EnHD folding to nucleation-condensation for Myb, TRF1 and RAP1.
The folding rate constant of 3 × 10 5 s À1 for the fast phase decreases with increasing viscosity and is only slightly sensitive to mutation or denaturant concentration.The formation of the intermediate is partly rate-limited by chain diffusion and partly by an energy barrier to give a very diffuse transition state.The process is rather like the association of barnase with its protein inhibitor barstar which proceeds via an encounter complex that is diffusionlimited, relatively insensitive to mutations and then precisely docks and makes specific interactions in a slower step (Schreiber andFersht, 1995, 1996).The folding is approaching the downhillfolding scenario of energy landscape theory (Gelman and Gruebele, 2014).
The free energy barrier that separates the native and denatured states ensembles in the energy landscape model may disappear under extreme conditions that greatly energetically favour the native state (Bryngelson et al., 1995), similar to extreme Hammond behaviour for the movement of transition states in covalent chemistry, Figure 18, where the transition state moves closer in structure to the denatured state as the product becomes more stable (Hammond, 1955).Under these conditions, the protein folds downhill energetically.The transition-state energy barrier reappears as conditions change to stabilise the denatured state ensemble, such as going through the thermal or denaturant Anti-Hammond behaviour as the transition state moves closer to the unfolded state in a direction perpendicular to the reaction coordinate on destabilisation of F see (Jencks, 1985).Right: Correlation diagrams of the average degree of folding, say β T , for the whole protein and Φ, the degree of formation of the helix, in the transition state.Top right: Average degree of folding in the transition state increases as the transition state moves along the reaction coordinate closer to F as the protein is destabilised by a mutation.Bottom right: Concurrent with the movement of the transition state along the reaction coordinate in the direction of F as the protein is destabilised by a mutation, there is anti-Hammond movement perpendicular to the reaction coordinate that leads to the helix becoming less folded and Φ decreases (Matthews and Fersht, 1995).
unfolding transitions.The finding of very fast folding small domains, 'miniproteins' that fold on the μs time scale or faster led to increased interest as what happens to pathways at folding close to the speed limit (Kubelka et al., 2004;Gelman and Gruebele, 2014).Barriers of <3k B T (<1.8 kcal mol À1 at 298 K) are suggested to be consistent with this type of downhill folding (Carter et al., 2013;Prigozhin and Gruebele, 2013).However, 'downhill folding on a rough energy landscape versus rapid folding through very shallow intermediates is in the eye of the beholder' (Gelman and Gruebele, 2014).All the states along the pathway/landscape are ensembles of structures (Figure 8).There is a residual native and non-native structure in the denatured state, and this coexists with folding intermediates and the native structure in varying proportions with changing conditions.The folded state is dynamic, with regions locally unfolding as demonstrated by hydrogen-deuterium exchange (Englander et al., 1997;Englander, 2023).The energy landscape has many local minima, which can contribute to kinetics when the transition state energy barrier is low.These problems are exacerbated for the small fast-folding domains because their folding equilibrium and activation energies are often low and the structure of domains taken from their parent is sensitive to the choice of domain boundaries.

Transition states across PSBD family: nucleation-condensation in very fast folding
The more thermostable two-helix bundle PSBD from B. stearothermophilus (E3BD) folds cooperatively and very rapidly, and its separated constituent α-helical regions have little helical tendency, showing fast folding does not require the docking of preformed elements (Spector et al., 1998(Spector et al., , 1999a(Spector et al., , 1999b;;Spector and Raleigh, 1999).Φ-value analysis at 325K by T-jump relaxation kinetics (Ferguson et al., 2005) and at 298K by rapid mixing and some T-jump (Ferguson et al., 2006) show a nucleationcondensation mechanism, which has a very diffuse transition state but with helix 2 the most structured.There is good consistency with calculated values from MD simulation.
Comparison of Φ-values with two other members of the PBSD family that have significant sequence identity but different helixforming propensities, POB, from Pyrobaculum aerophilum (Sharpe et al., 2008) and BBL (Neuweiler et al., 2009), Figure 22, provides information about conservation of folding mechanism in closely related, very fast folding, proteins.They all fold via nucleationcondensation, with Φ-values summarised in Figure 23.There are differences in that folding of E3BD and POB nucleates in Helix 2 but interactions in the folding transition state of BBL is more evenly dispersed across the structure, perhaps because of the high helical propensity of its Helix 1 (Neuweiler et al., 2009).The folding rate constants for E3BD, BBL, and POB at 298 K are 27,500 ± 500, 124,000 ± 5000 s À1 , and 210,000 ± 5000 s À1 , respectively, and follow the predicted helical propensities sites in the second helix.An increased helical propensity at the nucleation site appears to stabilise the folding nucleus and results in an increased folding rate constant.

The robustness and validity of Φ-analysis: Φ-Φ plots
The above examples show the wide and successful application of Φ-analysis.There have been criticisms of Φ-analysis, which have been critiqued by Gianni and Jemth (Gianni and Jemth, 2014).They have a nice argument on how plots of Φ versus Φ for processes in common demonstrate the robustness of Φ-analysis.Such plots on homologous proteins are used to compare folding transition states (Calosci et al., 2008;Wensley et al., 2009;Wensley et al., 2010).Sequences of identical proteins, such as circular permutants and circularised proteins, or homologous proteins with high sequence identity are aligned and values of Φ at the same position plotted for one against in the other, as in Figure 24.The probability that the pairs in each are not linearly related, P, is infinitesimal, consistent with their containing structural information.Provided that mutations are chosen as described and analysed in the first Φ -value paper (Matouschek et al., 1989 andearlier (Fersht et al., 1987), and too high or too low changes in ΔΔG DÀN not used (Fersht and Sato, 2004), Φ-value analysis is robust.The weak, medium, and strong categorisation provides adequate constraints for simulation.
Φ-value analysis has stood the test of time over three decades and we have gone from knowing virtually nothing about the fine structure of transition states for folding in the late 1980s to having a wealth of detailed information about many individual proteins.But can we draw generalisations?
The expanded transition state as a unifying mechanism for domain folding Proteins have evolved for optimal function in vivo and not the greatest stability or fastest folding.Protein activity often requires flexibility and dynamics for function, a stability that is high enough but not too high to prevent turnover where necessary, a rate of unfolding for some that is sufficiently slow to inhibit aggregation via unfolding, and a trade-off between overall stability and local instability of binding and active sites.For example, simple mutations can change the rate constants for the folding of CI2 over three orders of magnitude: wild-type folds at 25°C at 56 s À1 , the double mutant A16G/I57A in the folding nucleus at 2.4 s À1 , and R48F at 2300 s À1 .The active site of barnase is a source of instability (Meiering et al., 1992) and mutations elsewhere can greatly stabilise it without loss of activity (Serrano et al., 1993).Those factors will conspire to complicate the formulation of simple models for folding and its kinetics and cause exceptions to mechanisms.
'In their search for order, chemists invented Brønsted and Hammett correlations and other free energy relationships' so begins Jencks in his review of the movement of transition states across energy landscapes (Jencks, 1985).So, here is an attempt to bring some order, bearing in mind that there will be many exceptions.The unifying feature across the folding of most domains that comes from Φ-value analysis is that the highest energy transition state is an expanded, distorted form of the native structure, Figure 25 (Fersht, 2000).It varies from the pure nucleationcondensation mechanism at one extreme with mainly low to mid-range Φ -values to framework mechanisms at the other extreme with highly polarised transitions states and Φ-values from 0 to 1.The expanded nature of the transition state and its observed malleability both naturally across protein families and unnaturally on protein engineering accommodates the slide from pure nucleation condensation to framework mechanism, Figure 26.

Envoi
My research career has spanned seven decades that have seen ground-breaking innovations, beginning in the 1960s with the first  (Gianni and Jemth, 2014).(a) PDZ domains (Gianni et al., 2007;Calosci et al., 2008), (b) Circularly permuted PDZ domain (Ivarsson et al., 2009), (c) circularization of LysM domain (Nickson et al., 2008), (d) tryptophan as a fluorescence probe inserted in turn into each of the three helices of the B-domain of Protein A (Sato et al., 2006), and (e) the spectrin R16 domain with different neighbouring domains (Batey and Clarke, 2008).The P-value is the probability that the two variables are not correlated.high-resolution structures of proteins from X-ray crystallography, followed by recombinant DNA technology, DNA sequencing, new enabling biological and biophysical technologies, and advances in computation methods from simulation to machine learning today.It has been my privilege and pleasure to have been a participating protein scientist using directly or indirectly all these advances as they were introduced (Fersht, 2008(Fersht, , 2021)).Over the same period, we have gone from being just observers of the properties of proteins to being able to manipulate their structures and activities.We have progressed from the pathway of protein folding being a mysterious unknown to using those methodologies to solve the folding pathways of small domains at atomic resolution.There is much more experimental work to be done on more complex systems, where Φvalues will continue to provide otherwise inaccessible information.I hope that the Φ-values gathered by us all will be used as benchmarks for computation far into the future.It has been a marvellous time to have been a protein scientist.The best is still to come as we progress to unravelling the folding and mechanisms of complex protein systems and combine our acquired experimental knowledge with improved computation to design novel, functional proteins.Quarterly Reviews of Biophysics

Figure 1 .
Figure 1.Transition state is at a maximum for free energy, G, versus reaction coordinate, r.

Figure 2 .
Figure 2. Transition state for the general-base-catalysed attack of water on an ester.

Figure 4 .
Figure 4. Difference energy plot for mutations of side chains of the tyrosyl-tRNA synthetase.The values of ΔΔG mutÀwt (mutantwild type) for the ΔG of binding Tyr, ATP, [T-A] ‡ , T-A.PPi and T-A in the formation of tyrosyl-adenylate (Eq.(9)) on mutation of residues Cys35 and His48 (data fromWells and Fersht, 1986;Fersht et al., 1987).

Figure 8 .
Figure 8. Reduction of an energy landscape to a conventional reaction coordinate diagram.This reconciles the classical view of a pathway with the 'new view' of an energy landscape with an ensemble of conformations (afterEaton et al., 1996).Q is the relative number of pairwise native contacts in the landscape description and r is the conventional overall reaction coordinate.The number and heterogeneity of individual states decreases as the protein folds.(A, cross-section through a folding funnel (courtesy of P.G.Wolynes); B, reducing the landscape to a collection of ensembles moving along a pathway for the folding of a two-state protein such as CI2; and C, folding of a protein with a more structured denatured state.

Figure 10 .
Figure 10.Free energy profiles for mutations giving Φ = 0when the mutated residue A is in disordered region (left) or 1 in a fully native (right).The energy profiles are simplified with the energies of the denatured states D for wild-type and D' for mutant being set at the same level.

Figure 11 .
Figure 11.Double-Mutant cycles.X and Y are mutated individually and as a pair, and the values of ΔG DÀN or ΔG ‡ÀN measured.Interaction energies of X and Y with other residues cancel in the ΔΔG int cycles and are perturbed only by ΔΔG reorg terms in the folded state.For the denatured state, ΔΔG int = 0 when the residues X and Y do not interact with each other.Accordingly, the measured values of ΔΔG int = ΔΔG EYÀEXY À ΔG EÀEX = ΔG EXÀEXY À ΔG EÀEY give the interaction energies between X and Y in the native state at equilibrium or in the transition state for kinetics.

Figure 13 .
Figure13.Chevron plot for the folding of CI2 determined by stopped-flow kinetics(Jackson and Fersht, 1991a) and, inset, barnase(Matouschek et al., 1990).Rate constants are in units of s À1 .For CI2, the plot is for a perfect two-state transition and the arms are linear.For barnase, there is deviation at low denaturant concentration from the perfect theoretical two-state (solid line) because of a change in the structure of the denatured state or presence of a folding intermediate.

Figure 14 .
Figure 14.Chevron plots for folding of FBP28, which nicely illustrate Φ = 0, B, where the refolding limbs overlap, or 0, A, where the unfolding limbs overlap, and C and D for fractional values.T-jump was required for the rate constants in the range of 10,000 s À1 (Petrovich et al., 2006).

Figure 15 .
Figure15.Barnase folding from experiment and simulation.An MD unfolding simulation from the native state N to the denatured state D at 225 C, is shown in reverse.The structures are coloured from red at the N-terminus to blue at the C-terminus.The denatured state is an ensemble of structures whose overall topology resembles that of the native state.Τhe hairpin at the centre of the antiparallel β-sheet is present in the denatured state, albeit with some non-native interactions.The N-terminal helix is partly structured, stabilised by hydrophobic interactions.The final transition state consists of the largely formed N-terminal helix docked onto the β-sheet, which is strongly formed in the central regions, with the hydrophobic core in the process of being formed and other interactions consolidated(Fersht and Daggett, 2002).

Figure 16 .
Figure 16.Brønsted (Leffler) plots of ΔΔG ‡ÀN versus ΔΔG DÀN for CI2 which has a diffuse transition state and barnase which has a polarised one.

Figure 17 .
Figure 17.CI2 folding from experiment and simulation.An MD unfolding simulation from the native state N to the denatured state(s) D at 225°C shown in reverse.The structures are coloured from red at the N terminus to blue at the C terminus.The transition state is built around an extended nucleus, in which L49 and I57 pack against Ala16 (shown in magenta), towards the N terminus of the α-helix.There is flickering structure around Ala-16 in the denatured state.

Figure 18 .
Figure 18.Hammond and anti-Hammond behaviour for the folding of a protein.Left top: Conventional Hammond behaviour as the transition state moves closer to the folded state (F) along the reaction coordinate with increasing destabilisation of F. Left: bottom Cross-section of the energy profile perpendicular to the reaction coordinate at the transition state.Anti-Hammond behaviour as the transition state moves closer to the unfolded state in a direction perpendicular to the reaction coordinate on destabilisation of F see(Jencks, 1985).Right: Correlation diagrams of the average degree of folding, say β T , for the whole protein and Φ, the degree of formation of the helix, in the transition state.Top right: Average degree of folding in the transition state increases as the transition state moves along the reaction coordinate closer to F as the protein is destabilised by a mutation.Bottom right: Concurrent with the movement of the transition state along the reaction coordinate in the direction of F as the protein is destabilised by a mutation, there is anti-Hammond movement perpendicular to the reaction coordinate that leads to the helix becoming less folded and Φ decreases(Matthews and Fersht, 1995).

Figure 19 .
Figure19.Folding pathway of Engrailed Homeodomain (EnHD) from experiment and simulation.From right to left: native state (NS) structure solved by nuclear magnetic resonance and X-ray crystallography; transition state (TS) by Φ-analysis of secondary structure (colour-coded from Φ = 0, red, to Φ = 1, blue); the folding intermediate (I) stably generated by protein engineering and solved by NMR; the denatured state (U), under conditions that favour folding, simulated using molecular dynamics; and the entire unfolding pathway was simulated by molecular dynamics.

Figure 25 .
Figure25.A transition state that is an expanded, distorted, native structure being common to framework and nucleation-condensation mechanisms.

Figure 26 .
Figure 26.Combining elements of Figures 19 and 21 illustrate how movement of the expanded transition state on an energy landscape according to the classical principles of physical-organic chemistry unifies the slide between a diffuse nucleationcondensation transition state and the framework mechanism via a polarised transition state.Top: Reaction coordinate diagram for a framework mechanism with preformed secondary structure in a low energy intermediate that slides to nucleationcondensation as the secondary structure becomes less stable and requires tertiary interactions to stabilise it.The transition state can move along and perpendicular to the reaction coordinate according to Hammond and anti-Hammond effects, respectively.Both mechanisms involve an extended network of long-range nativelike tertiary interactions in the expanded transition state.Bottom: Correlation diagram of formation of native secondary and tertiary interactions illustrating the above.