Transformation of aqueous protein attenuated total reflectance infra-red absorbance spectroscopy to transmission

Infrared (IR) spectroscopy is increasingly being used to probe the secondary structure of proteins, especially for high-concentration samples and biopharmaceuticals in complex formulation vehicles. However, the small path lengths required for aqueous protein transmission experiments, due to high water absorbance in the amide I region of the spectrum, means that the path length is not accurately known, so only the shape of the band is ever considered. This throws away a dimension of information. Attenuated total reflectance (ATR) IR spectroscopy is much easier to implement than transmission IR spectroscopy and, for a given instrument and sample, gives reproducible spectra. However, the ATR-absorbance spectrum varies with sample concentration and instrument configuration, and its wavenumber dependence differs significantly from that observed in transmission spectroscopy. In this paper, we determine, for the first time, how to transform water and aqueous protein ATR spectra into the corresponding transmission spectra with appropriate spectral shapes and intensities. The approach is illustrated by application to water, concanavalin A, haemoglobin and lysozyme. The transformation is only as good as the available water refractive index data. A hybrid of literature data provides the best results. The transformation also allows the angle of incidence of an ATR crystal to be determined. This opens the way to using both spectral shape and spectra intensity for protein structure fitting.


Introduction
Protein infrared (IR) data, especially from the amide I (1,700-1,600 cm À1 ) band, are recognised as containing information about protein secondary structure. Normalised and second derivative spectra have both been used to estimate protein secondary structures (Chittur, 1998a(Chittur, , 1998b. The magnitude of the water signal (21.8 mol À1 dm 3 cm À1 at 1,643 cm À1 ) means a path length of less than 8 μm is required (Bertie and Eysel, 1985;Bertie et al., 1989;Venyaminov and Prendergast, 1997;Max and Chapados, 2009), and the protein in a 20 mg ml À1 aqueous sample contributes about 2% of the signal. Although the Beer-Lambert law for absorbance, A, (where I 0 is intensity incident on the sample, I final the final intensity, ε is the wavelengthdependent extinction coefficient, C is concentration, and ℓ is the total path length) holds for transmission IR spectroscopy, it is extremely challenging to collect sample and baseline spectra with the same, accurately known, path length on the micron scale. Protein IR spectroscopy has, therefore, often been performed in D 2 O, whose corresponding band occurs at 1,209 cm À1 . However, biological samples contain H 2 O, and biopharmaceutical products are formulated in H 2 O. Raman spectroscopy might be an alternative as Raman is less influenced by H 2 O scattering than IR absorbance is affected by H 2 O absorbance. However, the side chain contributions to Raman spectra seem to be sufficient to invalidate attempts at secondary structure fitting from Raman spectra (Pinto-Corujo, 2020). An alternative approach for IR spectroscopy was provided by the invention of attenuated total reflectance (ATR, Fig. 1) IR spectroscopy in 1959-1960 by Fahrenfort (Harrick, 1960, 1965;Harrick and du Pré, 1966;Mirabella, 1992). With ATR, the sample is placed in contact with a transparent dense crystal whose refractive index (speed of light in a vacuum, c 0 , divided by speed of light in the medium, c) is so much higher than that of the sample that if the angle of incidence (with 0°being perpendicular to the surface) is high enough, the light beam does not pass into the sample but is totally internally reflected (Fig. 1a). Maxwell's equations (see Appendix) require that there is electric (and magnetic) field intensity from the light present above the surfacethis is usually referred to as an evanescent wave, even though the light beam does not actually propagate away from the surface. These fields, which decay away from the surface, interact with the sample and can cause the molecule to move to an excited state so that less light reaches the detector than in the absence of the sample. Thus, a spectrum is generated that depends on the absorbance of the sample.
ATR spectroscopy has been widely adopted because of its ease of use. However, the spectrum that is measured with ATR is not the same as in transmission and differs from instrument to instrument: the relative magnitudes of different parts of the spectrum depend on the wavenumber, the angle of incidence of the light and the refractive indices of both the crystal and the samplewith the latter in turn being dependent on its absorbance. ATR-IR has, therefore, seldom been used quantitatively.
A cursory perusal of the literature leads one to believe the equations required to relate ATR-IR spectra to transmission spectra are to be found somewhere, as everyone quotes the same equations. However, we have failed to find coherent derivations, which hindered our ability to transform data quantitatively to an instrumentindependent spectrum to be used for regulatory purposes. The goal of our work was therefore to establish a method to transform aqueous protein ATR spectra into what could be measured in a transmission experiment. In this way, we can benefit from the reproducible simplicity of the ATR experiment, without having instrument-to-instrument variations.
This paper is structured in terms of increasing complexity, beginning with how to collect data, then how to transform an ATR aqueous protein spectrum to transmission and finally, the theory behind the equations used in the transformation is derived in the Appendix. Understanding of the theory is not required to perform the transformations, but it helps to understand the literature. Most of the analysis in this paper relates to water, for which reasonably accurate transmission and refractive index data are available. Application to three aqueous protein examples are also included.

Materials
18.2 MΩ water was used for all experiments. Proteins were obtained from Sigma-Aldrich (Gillingham, UK).

Instrument
IR data were collected with a Jasco V-470 IR spectrometer at 4 cm À1 resolution with a CaF 2 demountable cell without spacer (approximately, 1 μm pathlength) or a PIKE MIRacle single reflection ZnSe ATR unit and a triglycine sulfate (TGS) detector. A Specac Golden Gate single reflection ATR unit with a Jasco FTIR-4200 IR spectrometer was also used for comparison (Appendix). We chose TGS as it has a wider dynamic range than mercury cadmium telluride (MCT) detectors, and we found it was more reproducible, as well as having the pragmatic advantage of not requiring liquid nitrogen. ZnSe has reasonable energy throughput (~25-30% of that in a transmission experiment) over the range we wanted and a refractive index of n i~2 .4. Although germanium ATR crystals have a higher refractive index than ZnSe, which is attractive for reducing the wavenumber dependence of the signal, its smaller penetration depth means any surface effects (such as protein binding or The crystal surface is the x-y plane; +z is above the crystal and the incident light beam propagates in the x-z plane. (b) Light amplitude above and below the crystal surface. (c) Overlay of ATR-IR spectra of 20.0 mg ml À1 aqueous samples of three proteins after water baseline correction. Data collected with a Jasco V-470 IR spectrometer equipped with a PIKE MIRacle single reflection ZnSe ATR unit. orientation at the surface) are more significant. As baselinecorrected aqueous protein ATR spectra in the amide I/amide II region are proportional to concentration up to at least 50 mg ml À1 (data not shown) and since water essentially determines the penetration depth of the light beam (see below), we can conclude that concentration or significant orientation of the proteins is not occurring at the surface.
ATR-IR spectra were measured directly after taking a new background (air only) spectrum. Proteins were dissolved in water. A water baseline spectrum was subtracted from the aqueous protein spectrum. Baseline correction was sometimes improved by modifying the water spectrum by up to 1% of its magnitude. Baseline correction was deemed of good quality if the spectrum was flat in the liquid water libration/bend region (2,100 cm À1 ). If necessary, we then added or subtracted a water vapour spectrum until the water vapour bands around 1,700 cm À1 disappeared. The water vapour spectrum was generated from the difference between successive air spectra and scaled to the magnitude required. In general, if backgrounds (air only) were collected immediately prior to measuring a spectrum, this was not needed. Spectra were zeroed at 4,400 cm À1 unless the spectrum had been collected over a narrower window in which case, for aqueous samples, we used an intensity at 4,000 cm À1 of about 0.001 or at 2,620 cm À1 of about 0.003 as 'zero'.
Protein concentrations used in this work are in mg ml À1 of pure protein determined from measuring the A 280 of the solution in 1 mm pathlength cuvettes and using protein theoretical extinction coefficients determined from the sequence.

Parameters used
Water transmission spectra were scaled to the literature extinction coefficient of 21.8 mol À1 dm 3 cm À1 at 1,643 cm À1 (concentration 55.506 M). The initial refractive index of water was determined from Bertie et al. (1989) and Bertie and Eysel (1985) by linearly interpolating between published data points (referred to as Bertie). It was refined using Max and Chapados' (Max and Chapados, 2009) above~3,350 cm À1 (with manual smoothing in the cross-over region). The refractive index of ZnSe was taken from Madelung et al. (1999). The angle of incidence of our ATR PIKE unit was determined by establishing the value of θ i , which gave the best overlay of a transmission spectrum and a transformed-ATR spectrum. The value was 44.7°. The Specac unit (see Appendix), which had been aligned by eye, proved to have θ i = 48.0°.

Calculations
Excel was used to implement the transformations between transmission and ATR. The supplementary material contains a spreadsheet with typical calculations and data for the plots. Literature water absorbance and refractive index data were used (21.8 mol À1 dm 3 cm À1 at 1,643 cm À1 and~101 mol À1 dm 3 cm À1 at 3,408 cm À1 (Bertie and Eysel, 1985;Bertie et al., 1989;Venyaminov and Prendergast, 1997;Max and Chapados, 2009). Fig. 1c shows baseline-corrected ATR-IR spectra for three proteins of different secondary composition (fraction helix: concanavalin = 0.04; lysozyme = 0.42; haemoglobin = 0.76) at 20.0 mg ml À1 of protein in water. As the mean residue weights of the proteins are concanavalin: 108.5 Da, lysozyme: 111.6 Da and haemoglobin: 112.5 Da, the magnitude differences are more significant when molar concentrations are used. This means that discarding intensity information by normalising the amide I band is throwing away a dimension of information. With analogous circular dichroism data, normalisation leads to poor α-helix estimates (Hall et al., 2014). We are therefore motivated to provide a quantitative transformation of ATR data to transmission.

Results
Transmission versus ATR: Step 1, the penetration depth Aqueous protein absorbance spectra are dominated by water, so we first consider how to relate transmission and ATR water spectra. Fig. 2a shows the overlay of scaled transmission (dashed red line) and ATR (black line) water spectra. The 1,650 and 3,250 cm À1 bands correspond to water vibrations. The 2,125 cm À1 band is a combination liquid water bend + libration band (Max and Chapados, 2009), which is absent from water vapour (and proteins) and is thus extremely useful for ensuring good baseline subtractionthis region should be flat as in Fig. 1c. 2,350 cm À1 is CO 2 and indicates the stability of purging. The presence of water vapour is indicated by oscillations from 1,700 cm À1 downwards.
The relative magnitudes of the ATR and transmission spectra vary with wavenumber due in part to the 'depth of penetration' or effective path length, d p , of an ATR experiment, which here ranges from 0.34 μm (at 4,000 cm À1 ) to 1 μm (at 1,500 cm À1 ) (see below and Fig. A1a). As shown in the Appendix, where n = c 0 /c is the refractive index, i refers to incident light in the crystal, t refers to light transmitted above the crystal surface, λ 0 is the free space wavelength and θ i is the angle of incidence of the light on the crystal (Fig. 1a). The wavenumber variation in d p is often calculated assuming the refractive index, n t , is independent of sample (in our case, water plus protein) absorbance. This correction (Fig. 2a, dashed green line) transforms the ATR spectrum to be similar in magnitude to the transmission. Further improvement comes from including the wavenumber dependence of the refractive index ( Fig. 2a, blue dotted line). However, the difference is still larger than the protein absorbances of Fig. 1c, so the problem is not solved just by considering d p .
Transmission versus ATR: Step 2, the light intensity In ATR-IR, the light beam interacts with the sample via the electric field that exists above the ATR crystal. The two variable aspects of the flux of light that analyte molecules encounter are: (i) the intensity of the light just above the crystal surface and (ii) how quickly the intensity of the light decays away from the crystal surface Both of these factors depend on the nature of the ATR crystal, the frequency of the light, the sample absorbance as a function of frequency and the parameters of the instrument being used. What the instrument nominally plots as absorbance in the ATR experiment is Nominal absorbance ¼ A ATR ¼ log 10 light hitting the detector for the reference light hitting the detector for the sample where I air is the intensity that reaches the detector when only air is above the ATR crystal (the background) and I sample is the intensity that hits the detector in the presence of a sample on the crystal. The component strengths of the incident electric field light in the ATR experiment is different in the x, y and z directions of Fig. 1. They are also different just below and just above the surface of the crystal and different in the crystal from in air. As derived in the Appendix, the light intensity components transmitted to just above the surface of the crystal are: where, for example, I t0x is the x component of the light intensity, f AtoC is the air to crystal change in light intensity and f x is the x-direction change in light intensity of an unpolarised beam on passing from below to above the crystal surface. It follows from Fresnel's equations (see Appendix) that For an unpolarised beam and an unoriented sample, the total light intensity at the surface is simply Eqs. (4)-(7) are essential for quantitative inter-transformation between ATR and transmission absorbance intensities and differ from those found in the literature (Harrick, 1960(Harrick, , 1965Harrick and du Pré, 1966) by (i) f AtoC , which recognises the loss of light intensity in going from air into the crystal (which affects how much light the sample sees); (ii) inclusion of the dependence on θ i of the x and z component intensities just below the surface and (iii) a factor of 2 included for an unpolarised light beam.
At any position above the surface, the intensity is attenuated by both the decay of the evanescent wave plus any absorbance, so (see Appendix) where εC is the extinction coefficient (absorbing power) times concentration and f is as defined in Eq. (8).
In order to determine how the frequency dependence of the light intensity affects the nominal ATR absorbance signal (Eq. (3)), we integrate the absorbance from z = 0 to infinity. Thus (see Appendix), Water infrared (IR) spectra. (a) Transforming ATR towards transmission with penetration depth corrections: Overlay of: ZnSe ATR water spectrum; transmission water spectrum in a cell without spacer, ℓ ¼ 2:6 μm (based on absorbance at 1,643 cm À1 ); conversion according first term in Eq. (11) using the infinite wavenumber fixed refractive index and conversion according first term in Eq. (11) using the wavelength-dependent refractive index (Fig. A1). (b) Transforming transmission spectra to ATR: Overlay of ZnSe ATR water spectrum with first-order transformation (Eq. (10), term 1) transformation and full transformation (Eq. (9)). θ = 44.7°, C water = 55.506 M, ε 1643 = 21.8 mol À1 dm 3 cm À1 , water refractive index data from Bertie et al. (1989) and Bertie and Eysel (1985) (Bertie). d p in cm is used in all calculations. lg denotes log 10 . The difference between the ATR spectrum calculated from transmission data and the ATR spectrum is also shown.
(h.o.t. indicates higher order terms). Eq. (9) can be used as an accurate transformation of a transmission spectrum to ATR. Eq. (10) enables the reverse transformation. Its first term is comfortingly familiar as the transmission absorbance intensity times f, but it is only an approximation.
Transforming transmission spectra to ATR spectra Fig. 2b shows an overlay of the experimental water ATR spectrum and the water ATR spectrum calculated from transmission data using Eq. (9) (not an approximation), Bertie's water refractive index (not perfect, see Appendix) and an angle of incidence of 44.7°. The spectrum from only the first term in Eq. (10) is also shown. The Eq. (9) spectrum (dashed green line) overlays with the experimental ATR except in the regions of absorption maxima (see residual spectra). Literature water refractive indices vary noticeably (Fig. A1), especially in regions of high absorbance (Bertie and Eysel, 1985;Bertie et al., 1989;Max and Chapados, 2009). If we choose to use Bertie's data until about 3,350 cm À1 and Max and Chapados' after that point, we get better results for two different instruments and ATR units (Fig. A1). We therefore conclude that this hybrid refractive index is best for applications. The first term in Eq. (10) is reasonably good for absorbances less than 0.1 and corresponds to the low absorption limit of the literature (Harrick, 1965;Milosevic, 2004). However, for aqueous proteins, due to the water absorbance, the low absorption limit is not satisfactory. Including the second term improves the result as much as the quality of available refractive index data allows. If one normalises the data to 1 at the maximum near 1,645 cm À1 (a common practice), the discrepancies are masked in the amide I region of the spectrum but are still there and affect any further applications of the data.
Transforming ATR spectra to transmission spectra As ATR spectra vary from instrument to instrument, we are more likely to want to transform ATR spectra into transmission than the converse. As shown in Fig. 2b, for water, we need more than the first term of Eq. (10), so we use the first two terms to give Multiplication by the experimental path length is required to turn Eq. (11) into transmission absorbance, and division by concentration gives extinction coefficients. The difference between the solid black and dashed green lines in Fig. 3 at 3,500 cm À1 is a combination of losing the higher order terms (a small effect, see above) and the experimental errors in water transmission absorbance and refractive index in that region. However, the intensities and shapes give us confidence that even in the high-absorbing regions of the spectrum, Eq. (11) is adequate.

Application to aqueous protein spectra
To perform the ATR to transmission transformation for aqueous protein samples, we first assume d p is the same for both the protein in buffer and the buffer since amide I protein absorbance is about 2% of the H 2 O absorbance at 20 mg ml À1 protein. Then, from Eq. (11), removing the ATR subscripts (so A denotes the nominal ATR absorbance), we write where P refers to the protein contribution to absorbance, PW denotes the absorbance of the protein plus water/buffer solution and W denotes the water/buffer absorbance. A P is determined using the baseline correction process outline in the Methods section. In what follows, we use Bertie's refractive index data modified by Max and Chapados' data at high wavenumbers (see Appendix).
When the different levels of correction are applied and the results all scaled to unit normalised absorbance (Fig. 4a for lysozyme) at the amide I maximum, we reduce the comparison to band shape only. Most protein IR spectroscopy is presented in this way to avoid needing accurate concentrations and path lengths. As illustrated for lysozyme in Fig. 4a, everything looks artificially good for the normalised amide I band, though the amide II is poor except for the second-order correction. Alternatively, one could make the amide II look good at the expense of the amide I band.
When we do not normalise the spectra, it is clear from Fig. 4b that the second-order correction is required. Concanavalin (Fig. 2c) and haemoglobin (Fig. 2d) present a similar picture to that of lysozyme.
The apparently disappointing aspect of Fig. 4a,b is the 1-2 cm À1 mismatch of the transmission spectrum and the second-order ATR to transmission transformation (which includes the second term in Eq. (12) with full wavenumber dependence of the refractive index in d p and f ). Rather than optimising data collection and baseline Fig. 3. Overlay of water extinction coefficients for transmission and first-and secondorder terms in the transformation of ATR to transmission (Eq. (11)) determined using θ = 44.7°, 55.506 M as concentration, 21.8 mol À1 dm 3 cm À1 as the extinction coefficient and Bertie's refractive index data (Bertie and Eysel, 1985;Bertie et al., 1989). d p in cm is used in calculations.
correction to give an answer we would like to see, we simply note that quite small changes in water vapour and baseline move the precise position of peaks, especially for transmission spectra. In our experience, maxima of 18 independently collected lysozyme spectra ranged by 3 cm À1 (data not shown). This emphasises the importance of having magnitude information retained for analysis.

Discussion
The main conclusion from this work on the analysis of water transmission and ATR IR spectra is that we have presented equations that for the first time can be used to transform transmission to ATR IR absorbance spectra using Eq. (9). The second-order correction of Eq. (11) is a reasonable approximation for the converse ATR to transmission transformation. However, the transformations are only as good as the available refractive index data and literature data at absorbance maxima vary by a few percent. As a result of confidence in our ATR-IR transformations, a refined water refractive index spectrum that is a hybrid of two literature spectra is available in the supplementary material. A constant value of the refractive index can only be assumed for samples with very small experimental absorbances.
Transformation of an ATR spectrum to transmission also depends on the angle of incidence of the light passing from air into the ATR crystal. By optimising the overlay of the transmission spectrum and the transformed ATR spectrum, we determined our angles of incidence on two systems to be 44.7°and 48°. This is an indirect benefit of this work, as in our experience, manufacturers simply declare the value is 45°and do not provide a method for determining it.
When we come to consider proteins in the amide I region, the absorbance and refractive indices are dominated by the contribution of water. It therefore follows that the refractive index is very close to that of water. We have shown that for the amide I band of aqueous protein samples (up to at least 50 mg ml À1 ), Eq. (12) gives the transformation from ATR to transmission with spectra are divided by a path length estimated from the water spectrum used for the baseline. Vertical black lines are an estimate of the pathlength and baseline correction error for the transmission spectra. ATR spectra are presented as data divided by d p at the protein absorbance maximum (~1,645 cm À1 ) to facilitate comparison. This arbitrary choice leads to a magnitude match at the maximum. d p n = 1.33 denotes d p calculated assuming the refractive index has no wavenumber dependence. d p denotes a calculation using literature wavenumber-dependent refractive index. d p f is first-order correction of Eq. (11).
significantly less variation than we get from the buffer and water vapour corrections required for data analysis. Further improvements would follow by including protein absorbance into the refractive index.
If in practice, only the shape of the amide I band is requiredthis is current practice as all spectra are usually normalised to 1then, as long as wavenumber-dependent water refractive indices are used, one can get away with only the first-order correction (the d p f ) or even just d p . However, if the information contained in the intensity is also important, or both amide I and amide II bands are to be considered, then the d p f and second-term corrections are required.
For future protein structure-fitting work using IR spectra, we therefore recommend the simplicity of ATR-IR data collection followed by transformation to transmission shape and intensity using Eq. (12). Eq. (12) in turn requires d p (Eq. (2)) and (5) and (6)), which depend on the refractive indices of water and the ATR crystal and the angle of incidence.
Finally, it is important to note that in contrast to transmission spectroscopy, in ATR spectroscopy, the evanescent field has different amplitude electric field components along x, y and z. If one's sample is unoriented (or largely so) then that is not a problem, however, if one's sample is oriented, then it will interact to different extents with light polarised in the x, y and z directions, giving a spectrum that is a combination of absorption and linear dichroism. Given the penetration depth in our experiments is~1,000 nm and a typical protein is 5-10 nm in size, we can ignore an oriented monolayer or two, even allowing for the higher intensity electric field close to the surface. However, if an oriented structure propagates, ATR is unlikely to be an appropriate technique for accurate spectroscopy. Germanium ATR crystals have higher refractive index than ZnSe, which is attractive for reducing the wavenumber dependence of f , but its smaller penetration depth means any surface effects are more significant.

Appendix
This appendix presents a derivation of equations used in the main text.
Although some of what follows may be found in the literature (e.g. Harrick, 1965;Milosevic, 2004), we have been unable to find their full derivations and found that the literature equations miss some factors needed for quantitative transformation rather than just ratios. In addition, assumptions about low absorbance are usually made, which are simply not valid for water. We here use the Maxwell boundary conditions on the electric and magnetic fields at the crystal surface boundary as the starting point.
The structure of what follows is first a discussion of absorbance in a linear transmission mode followed by ATR. Refractive index is then explored as a means of getting to the light intensity at each height above the ATR crystal in the presence of an absorbing sample. Finally, the ATR absorbance equations are derived to determine equations for what the spectrometer plots as absorbance (i.e. the log of the fraction of light that does reach the detector). The analysis could work for any solvent but is applied here to water using literature refractive index data. We use our analysis and experimental spectra to select literature values for the refractive index at different frequencies.

Equations for absorbance
Absorbance occurs when the electric field of light interacts with a molecule causing it to jump to an excited state. Usually, the light-molecule coupling is dominated by the electric dipole transition moment, μ. Within a quantum electrodynamic formalism, the transition rate from 0 j 〉 ! f j 〉 follows the Fermi golden rule (Craig and Thirunamachandran, 1984;Tokmakoff, 2009) is an analogue of the unpolarised Einstein B-coefficient. Here, N is the number of absorbers, ε 0 the permittivity of vacuum, b e the unit vector parallel to the electric field of the light and μ f 0 the electric dipole transition moment from the ground state to the final state. The intensity of the light, I ω ð Þ, is the time averaged value of the Poynting vector, E Â B, where E is the electric field vector and B the magnetic field vector andk for b k, the unit vector parallel to the direction of propagation. Thus, So, to allow for complex fields and wavefunctions (though they are not needed here), we use tensor notation and write where inverting the order of the states of the transition dipole moments means taking the complex conjugate (denoted by *) of any complex wavefunctions.

Transmission absorbance
When light passes through a sample, if it is the right frequency, it may be absorbed and its intensity reduced as it proceeds along the light path (x). In a transmission experiment, we usually ignore the polarisation information in Eq. (17) unless we need it and we replace N in Eq. (13) with C and write, for light propagating along the x-direction, where κ, the Eulerian extinction coefficient, is a constant for a given sample and frequency (as long as molecules do not interact). The light at the exit edge of the sample, I ℓ , is therefore related to the intensity at the entrance edge, I 0 , by the Beer-Lambert law for a sample of path length ℓ. ε, the standard decadic extinction coefficients is related to κ by κ ¼ ln 10ε (20) since log 10 I 0 =I ℓ ð Þ¼log 10 e ln I 0 =I ℓ ð Þ is what spectrometers output as absorbance, A. I 0 at the entry edge of the sample is reduced by 2= 1 þ n t ð Þ ð Þ 2 (Eq. (39)) when it crosses the air/sample boundary at normal incidence. However, as the detector is outside the sample and the absorbance is linear with concentration, the inverse scaling factor is operative as the light emerges from the sample so, for transmission, we may ignore the intensity change at the interfaces of the sample and the air.

ATR absorbance
In ATR spectroscopy, what we measure is more complex than in transmission: the light beam does not pass through the sample, rather the evanescent wave has x, y and z components of the electric field that interact with molecules in the sample leading to absorbance if frequencies match energy-level gaps. Further, the evanescent wave decays exponentially from the surface whether absorbance occurs or not. We use i for light within the crystal and t for light transmitted above the surface. Although we are interested in isotropic samples and unpolarised light in this work, the initial equations below can be applied to oriented samples. To derive Eqs. (9) and (12) of the main text, it is convenient to proceed in three stages: (i) understand the electric field and penetration depth of the evanescent light in the absence of an absorbing sample, (ii) express the evanescent light in terms of the incident (air) values and (iii) determine what the instrument outputs as nominal ATR absorbance.

Refractive index as a function of wavenumber
For refractive index in the crystal, we write and in the sample For this work, we used literature values for water refractive index (Querry, 1969;Querry et al., 1969;Bertie and Eysel, 1985;Bertie et al., 1989;Max and Chapados, 2009) and ZnSe (Madelung et al., 1999) (which hardly changes) linearly interpolating between published data points (Fig. A1a). A further correction could be undertaken to improve n t for a protein sample by first estimating it, assuming that the aqueous protein refractive index is the same as that of water, then iteratively recalculating n t using a newly transformed ATRto-transmission spectrum to include the protein absorbance using a Kramers-Kronig (Ohta and Ishida, 1988;Max and Chapados, 2009;Ogilvie and Fee, 2013) transformation, where n t∞ is the sample refractive index at infinite frequency and ℘ denotes the Cauchy Principal value. In practice, for protein concentrations less than 50 mg ml À1 , the difference is negligible (data not shown), so we worked with water values for our protein solutions. Literature values for the refractive indices of water vary, especially in regions of high absorbance, so we began with the data from Bertie et al. and refined them by using the data of Max and Chapados at high wavenumber to give an improved transmission to ATR transformation for water. We confirmed the improvement by using it on data from a different instrument and ATR unit (Fig. A1b,c). The Fig. 2 and Fig. A1 plots were generated by determining the value of θ i which minimised the difference between water ATR and transformed transmission spectra: PIKE ATR θ i = 44.7°and Specac ATR θ i = 48.0°. The success in transforming the Specac ATR unit data using the hybrid water refractive index suggested by the Pike ATR unit data (purple dashed line of Fig. A1c) indicates that the error in the transmission to ATR transformation is less than 0.5%. The second-order (Eq. (10)) ATR to transmission transformation has a similar error. The experiment-to-experiment variation and water baseline-correction error of an ATR experiment is typically 1-3% depending on the operator.
The electric field and penetration depth of the evanescent light in the absence of an absorbing sample on the ATR crystal

The fields
Consider refraction at the crystal boundary ( Fig. 1): the incoming wave has wave vector k i with an angle of incidence θ i to the normal. The interface is in the x-y plane with the normal along z. The incoming electric and magnetic fields are, respectively, where c.c. denotes complex conjugate, r ¼ x, y, z ð Þis the position in space, E i0 is the electric field in the crystal incident on the crystal-sample boundary, j ¼ ffiffiffiffiffiffi À1 p and k 0 ¼ 2π=λ 0 ¼ 2πν=c 0 ¼ ω=c 0 is the magnitude of the wave vector of light in vacuum, etc.
Without loss of generality, we take the incident wave to propagate in the x-z plane and, similarly, above the surface but using the subscript t. Temporal translational invariance requires that the frequency (but not necessarily speed) of the light is unchanged on transmission across the crystal surface. Spatial translational invariance in the x-y plane requires that the x-component of the wave vector is conserved which leads to Snell's law. We may also write Above the critical angle, where total internal reflection occurs, it is better to write where α is real and j ¼ ffiffiffiffiffiffi À1 p . The analogue of Eq. (24) for the electric field above the crystal is thus So, with an ATR instrument, above the crystal surface, we have an evanescent wave which propagates along the x-axis, bound to the interface, decays exponentially in field strength with distance from the surface (Fig. 1b) and has nonzero field components in arbitrary directions, depending on the incident polarisation.

The penetration depth
A common concept in ATR spectroscopy is the penetration depth, d pthe depth that enables the sample to 'see' the same flux of light as it would in a transmission experiment with pathlength d p . It is the distance from the surface at which the light intensity has decayed to 1/e of its amplitude at the surface (see below).
In a transmission experiment, the total flux seen by a sample of path length ℓ is proportional to I t0 ℓ=2, with the factor of 2 coming from the time average of the wave (Schellman, 1975). By way of contrast, in ATR (assuming no absorbance), the electric field decays away from the surface with rate Àα (Eq. (29)), so the light intensity (Eq. (16)) decays with rate À2α. Writing I t0 as the intensity just on top of the crystal surface, the total energy flux seen by a (nonabsorbing) ATR sample is proportional to I t0 ð ∞ 0 e À2αz dz ¼ I t0 = 2α ð Þ: If we equate Eq. (31) with I t0 ℓ=2, it follows that 1/(α) is an effective path length for low (strictly, non) absorbing samples. So, we write the penetration depth as The ATR electric field strength has decreased to 1/√e of its original value at this point and the light intensity to 1/e of its original value. This means that with θ i = 45°for a ZnSe ATR crystal (n i = 2.4) and n t~1 .3 (refractive index of water away from absorbance (Max and Chapados, 2009;Kitadai et al., 2014) and d p~0 .8 μm (Fig. A1a). This is an ideal path length for infrared measurements on aqueous samples.

Evanescent light in terms of the incident values
To calculate the nominal ATR absorbance (i.e. what the spectrometer plots), we need to know the light that is removed from the incident beam. We therefore need the electric field of the light that interacts with the sample. For s-polarisation (E = E o y in our axis system, Fig. 1) at the crystal surface (z = 0) from Maxwell's boundary conditions we write where i is incident, r is reflected and t transmitted. The tangential magnetic field obeys a similar condition B i0s cosθ i À B r0s cosθ i ¼ B t0s cosθ t , which, using Eq. (15), becomes E i0s n i cos θ i À E r0s n i cos θ i ¼ E t0s n t cosθ t : Similarly, for the perpendicular p-polarisation and E i0p n i À E r0p n i ¼ E t0p n t : These equations in different combinations lead to the Fresnel relations for transmitted (and reflected) waves including and It follows that the amplitudes of the electric fields of the two polarisations at the surface, from Eqs. (30), (39) and (40), are E t0s ¼ E i0s n i cos θ i n i cos θ i þ n t cos θ t ¼ E i0s n i cosθ i n i cosθ i þ jn t ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi n 2 it sin 2 θ i À 1 p ¼ E i0s n i cos θ i n i cosθ i þ j ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n 2 i sin 2 θ i À n 2