Engineering polymerases for applications in synthetic biology

Abstract DNA polymerases play a central role in biology by transferring genetic information from one generation to the next during cell division. Harnessing the power of these enzymes in the laboratory has fueled an increase in biomedical applications that involve the synthesis, amplification, and sequencing of DNA. However, the high substrate specificity exhibited by most naturally occurring DNA polymerases often precludes their use in practical applications that require modified substrates. Moving beyond natural genetic polymers requires sophisticated enzyme-engineering technologies that can be used to direct the evolution of engineered polymerases that function with tailor-made activities. Such efforts are expected to uniquely drive emerging applications in synthetic biology by enabling the synthesis, replication, and evolution of synthetic genetic polymers with new physicochemical properties.


Introduction 1
Natural polymerases 2 Fundamentals of DNA synthesis 2 Visualizing DNA synthesis through snapshots of trapped intermediates 3 Capturing phosphodiester bond formation by time-resolved crystallography 6 Introduction DNA polymerases are an ancient family of enzymes responsible for replicating the genomes of organisms during cell division. Their movement on a template associates them to molecular motors that are powered by the free energy of nucleotide polymerization (Gelles and Landick, 1998). However, in contrast to most molecular motors, which are largely responsible for transporting cargo along protein tracks (Schliwa and Woehlke, 2003), polymerases are a class of enzymes whose movements facilitate the transfer of information from parent to daughter strands using well-established Watson-Crick base pairing rules (Watson and Crick, 1953). This impressive feat of chemical synthesis is accomplished through a complicated reaction pathway where each cycle of nucleotide addition involves a set of carefully orchestrated conformational changes that allow the enzyme to form a covalent bond between the growing primer strand and the correct incoming nucleotide (Steitz, 1999). The superiority of these motors is further demonstrated through the use of accessory domains that enable the enzyme to recognize and correct mistakes that arise due to misincorporation events. Thus, polymerases can be thought of as biological scribes capable of forward and reverse motions that allow for the writing and editing of genetic messages with unparalleled speed and accuracy. The process of nucleotide selection, insertion, and extension is regulated by a series of checkpoints that control the efficiency and fidelity of nucleotide synthesis. Elegant biochemical, kinetic, and structural studies reveal the importance of induced fit in distinguishing correct nucleotides from incorrect nucleotides (Bryant et al., 1983;Johnson, 2008;Ludmann and Marx, 2016), noting that Watson-Crick hydrogen bonds are not always necessary for the replication of a DNA base pair (Moran et al., 1997b). Other factors that affect nucleotide recognition include hydrogen bonding to minor groove heteroatoms, base stacking, solvent exclusion, and shape (Kool, 2002). Once present in the active site, chemical bond formation requires the substrate to adopt a productive geometry that leads to phosphodiester bond formation. In all cases, this involves a combination of side chain and divalent metal ion interactions that orient the substrate in a position that is suitable for in-line nucleophilic attack by the terminal 3 ′ hydroxyl group on the primer strand (Steitz et al., 1994;Genna et al., 2016). In cases where a polymerase is able to incorporate a modified nucleotide, additional molecular recognition events are available to detect changes in the duplex geometry, which often leads to polymerase stalling (Miller and Grollman, 1997). Although these parameters can vary between individual polymerases, the checkpoints of nucleotide selection, chemical bond formation, and primer extension place severe limitations on the synthesis of unnatural nucleic acid polymers by natural DNA polymerases.
One striking example of substrate specificity is the ability for polymerases to discriminate between DNA and RNA substrates inside the cell. The molecular difference between 2 ′ -deoxyribonucleoside triphosphates (dNTPs) and ribonucleoside triphosphates (NTPs) is the presence of a 2 ′ -hydroxyl group on the ribose sugar, which causes the furanose ring to adopt a different sugar pucker (C2 ′ -endo versus C3 ′ -endo for DNA and RNA, respectively) (Anosova et al., 2016). Even though intracellular NTP levels are elevated relative to dNTP levels ( >10-fold) (Traut, 1994), DNA polymerases, such as Escherichia coli (E. coli) DNA polymerase I, are able to discriminate against NTPs by a factor of up to 10 5 -fold (Astatke et al., 1998). This remarkable level of substrate specificity is achieved by a single bulky amino acid residue, referred to as the 'steric gate', which packs against the 2 ′ sugar position, preventing the insertion of NTPs into the enzyme active site. The steric gate is now recognized as a common feature of most DNA polymerases (Bonnin et al., 1999;Brown and Suo, 2011).
In this review, we examine the impact of polymerase engineering on the field of synthetic biology. Special emphasis is placed on examples in which engineered polymerases have enabled the synthesis, replication, and evolution of synthetic genetic polymers with new physicochemical properties, such as enhanced ligand binding, catalysis, and biological stability. Such activities represent the forefront of polymerase engineering, as functional non-natural polymers are expected to drive future applications in synthetic biology, biotechnology, and healthcare. We begin with a review of polymerase function and structure, illustrating the latest techniques that have been used to answer fundamental questions about the mechanism of DNA synthesis. Next, we discuss examples where natural polymerases are able to recognize non-cognate and synthetic congeners as substrates either in the template or as nucleoside triphosphates. We then examine several techniques that have been applied to engineer polymerases with desired functional properties. Here we focus our attention on avant-garde strategies that are rapidly advancing the field of polymerase engineering. Finally, we conclude with examples of synthetic biology applications that have arisen due to the availability of engineered polymerases.

Natural polymerases
Fundamentals of DNA synthesis DNA polymerases follow a primer extension mechanism in which a single strand of parental DNA is used as a template to synthesize the complementary daughter strand. In this reaction, the growing daughter strand is recognized as a primer that is extended in the 5 ′ -3 ′ direction by sequentially adding the corresponding dNTP to the terminal 3 ′ -hydroxyl group. As illustrated in Fig. 1, the template dictates the sequence of nucleotide addition following the classic Watson-Crick base pairing rules of adenine (A) pairing with thymine (T) and guanine (G) pairing with cytosine (C). Because the polymerase moves down the template in the 3 ′ -5 ′ direction and the new DNA strand is generated in the 5 ′ -3 ′ direction, the resulting product is an antiparallel DNA duplex.
Phylogenetic analysis reveals that DNA polymerases organize into seven different highly homologous sequence families (A, B, C, D, X, Y, and RT) ( Table 1) (Ito and Braithwaite, 1991) that allow the activity of one member to predict the activity of another member. For example, the mutations required to imbue a natural DNA polymerase with RNA synthesis activity have similar activity when transferred to homologous enzymes . As expected, some polymerase families have been more widely studied than others. Thermostable DNA polymerases belonging to the A-and B-family categories have been extensively studied due to their importance in DNA synthesis and sequencing applications. For example, A-family DNA polymerase I isolated from the thermophilic bacterial species Thermus aquaticus (Taq) is widely used in quantitative polymerase chain reaction (qPCR) applications due to its 5 ′ -3 ′ exonuclease activity, which allows for the digestion of a downstream donor-quencher fluorescent probe that quantitatively measures DNA synthesis during polymerase extension (Holland et al., 1991). Taq DNA polymerase is also routinely used for T-A ligation and cloning strategies due to its proclivity 2 Ali Nikoomanzar et al.
for adding a single-untemplated adenosine residue to the 3 ′ end of the daughter strand (Clark, 1988). Hyperthermophilic archaeal B-family DNA polymerases, which include such members as Tgo (Thermococcus gorgonarius), Kod (Thermococcus kodakarensis), Pfu (Pyrococcus furiosus), and 9°N (Thermococcus 9°N-7), are the basis of several DNA-sequencing applications (Zhang et al., 2015a). These enzymes are known to function with enhanced fidelity due to the presence of a strong 3 ′ -5 ′ exonuclease proofreading domain. They are also reported to be more resistant than standard Taq polymerase to the inhibitory effects of blood components and detergents (Miura et al., 2013). Interestingly, B-family polymerases have the ability to recognize and stall DNA replication when they encounter uracil residues in the template (Greagg et al., 1999). Structural studies indicate that uracil discrimination is caused by a binding pocket in the aminoterminal domain of the polymerase that accommodates uracil but prevents binding to the four natural DNA bases (Fogg et al., 2002). Despite extensive sequence diversity, X-ray crystal structures reveal that nearly all polymerases adopt a catalytic domain that closely resembles a human right hand (Steitz, 1999). The one exception is X-family polymerases, which adopt a left-handed polymerase domain (Beard and Wilson, 2000). The catalytic domain is further divided into three subdomains that are commonly referred to as the palm, fingers, and thumb (Fig. 2a). The palm subdomain is composed of a β-sheet that forms the base of a deep cleft containing the catalytic residues responsible for promoting phosphodiester bond formation. The fingers subdomain is an α-helical structure lining one side of the cleft, while the thumb subdomain is another α-helical structure lining the opposite side of the cleft. The fingers are responsible for recognizing the incoming nucleoside triphosphate, while the thumb positions the DNA primer-template duplex in the cleft and plays a role in translocation and processivity (Brautigam and Steitz, 1998).
Speed and fidelity are critical parameters for DNA synthesis in rapidly dividing cells. For each nucleotide incorporation, a polymerase must distinguish the correct nucleoside triphosphate from an excess of incorrect and non-cognate (NTP) substrates. Due to their functional roles, the rate and fidelity of DNA synthesis can vary widely between different DNA polymerases ( Fig. 2b and c). Replicative DNA polymerases found in A-and B-families have rates that can exceed 100 nt s −1 and intrinsic fidelities in the range of one error in 10 5 -10 6 incorporation events (Kunkel, 2004). For example, Kod polymerase functions with a rate of ∼200 nt s −1 , making it one of the fastest B-family DNA polymerases (Griep et al., 2006). In addition, many polymerases have 3 ′ -5 ′ exonuclease proofreading activity that exists as a separate domain or a tightly bound subunit, which can remove noncomplementary nucleotides after phosphodiester bond formation ( Fig. 1) (Kunkel and Bebenek, 2000). These domains increase the fidelity of DNA synthesis by 10-fold (10 6 -10 7 nt s −1 ) relative to polymerases lacking a proofreading domain (Loeb and Monnat, 2008). By comparison, repair polymerases, such as pol β (X-family), are much slower at DNA synthesis and less faithful than replicative DNA polymerases, often functioning with rates in the range of ∼10 nt s −1 and fidelities on the order of 1 error in 10 2 -10 4 incorporation events ( Fig. 2b and c) (Wu et al., 2017). However, the reduced activity of repair polymerases is expected given their functional role in repairing damaged sites in genomic DNA by various cellular repair mechanisms.

Visualizing DNA synthesis through snapshots of trapped intermediates
Since its discovery in 1958, DNA polymerase I has been viewed as a model system for DNA synthesis in cells (Lehman et al., 1958). Structural insights into the mechanism of DNA synthesis have been obtained from crystal structures that trap the enzyme at different stages of the catalytic cycle ( Fig. 3) (Chim et al., 2018). Some of the most insightful data have been obtained from high- Fig. 1. DNA synthesis and mismatch repair. Natural DNA polymerases extend a DNA primer in the 5 ′ -3 ′ direction using the template to determine the sequence of the growing strand. Polymerases with 3 ′ -5 ′ exonuclease activity have the ability to correct mistakes by removing terminal nucleotides that are incorrectly paired with the template.  (Kiefer et al., 1998;Steitz, 2002, 2004). Starting from the Bst binary complex produced from one round of dNTP addition, Tyr 714 on the O-helix occupies the insertion site, stacking above the newly added nucleotide on the growing primer strand (Chim et al., 2018). In this same structure, Tyr 719 on the O1-helix forms a second stacking interaction with the n + 1 templating base, thereby preventing the next templating base from entering the active site. In step 2, the polymerase undergoes a conformational change to adopt a pre-insertion complex with the incoming nucleotide paired opposite Tyr 714 in the enzyme active site (Chim et al., 2018). This intermediate, commonly referred to as the open ternary complex, is achieved by releasing the n + 1 templating base from its stacking interaction with Tyr 719 and retracting Tyr 714 to a position above the n templating base in the post-insertion site. In step 3, the enzyme undergoes a more significant conformational change to adopt a closed ternary complex, which defines the pre-catalytic state of the enzyme (Johnson et al., 2003). Here, the n + 1 templating base finally enters the insertion site and forms a Watson-Crick base pair with the incoming nucleotide. In this structure, the fingers have rotated ∼40°to allow several lysine and arginine residues on the O-helix to contact the triphosphate moiety of the dNTP substrate. In step 4, the enzyme adopts a post-catalytic complex in which chemical bond formation has occurred and the primer has been extended by one nucleotide (Yin and Steitz, 2004). Close examination of the enzyme active site reveals the presence of the pyrophosphate leaving group, suggesting that pyrophosphate departure coincides with opening of the fingers. To complete the cycle, the polymerase must translocate to the next position on the template to reform the binary complex. Together, crystal structures of the binary complex, pre-insertion site, closed ternary complex, and post-catalytic complex provide a structural view of DNA synthesis by a replicative DNA polymerase. Structural and kinetic data reveal that DNA polymerase fidelity is governed by subtle local rearrangements that are distinct from the major conformational domain movements observed in the binding and catalysis of cognate Watson-Crick base pairs. In particular, researchers have identified a distinct conformation in A-family polymerases that was suggested to be a fidelity checkpoint for correct nucleotide selection. X-ray crystal structures of Bst DNA polymerase containing mismatched substrates reveal a kink in the O-helix of the finger subdomain that results in a partially closed ternary complex, termed the 'ajar' conformation ( Fig. 4a) (Wu and Beese, 2011). The wobble base pair between the templating G nucleotide and the incoming TTP substrate Fig. 3. Mechanism of DNA synthesis. The four key mechanistic steps depict a replication cycle for DNA synthesis. The translocation complex (top) is stabilized by π-stacking interactions between Tyr 719 and the n + 1 templating base and between Tyr 714 and the primer strand. Tyr 714 occupies the insertion site (IS, purple) while a newly formed base pair is located in the post insertion site (post-IS, green). In the pre-insertion complex (right), the O-helix adjusts to accommodate the incoming dNTP substrate, which binds opposite Tyr 714 in the IS. In the closed ternary complex (bottom), the polymerase undergoes a major conformational change to allow the n + 1 templating base to form a nascent base pair with the dNTP substrate in pre-catalytic state. Following catalysis, the finger subdomain remains closed with a trapped pyrophosphate moiety observed in the active site of the post-catalytic complex (left). To complete the cycle, the finger subdomain opens, pyrophosphate is released, and the enzyme translocates to the next position on the template. The translocation (6DSY), pre-insertion (6DSU), and closed ternary complexes (1LV5) are based on crystal structures of Bst DNA polymerase. The post-catalytic complex is T7 RNA polymerase (1S77), a homolog of Bst DNA polymerase. Adapted from Chim et al. (2018).
Quarterly Reviews of Biophysics places the α-phosphate at a distance that is too far from the 3 ′ -OH group of the primer to facilitate efficient in-line attack on the dNTP substrate. This observation is supported by a reduction of at least 100-fold in the rate of nucleotide addition compared to the complementary dCTP substrate (Wu and Beese, 2011).
Förster resonance energy transfer (FRET) studies performed on the large (Klenow) fragment of E. coli DNA polymerase I provide additional evidence for the existence of the ajar conformation (Berezhna et al., 2012;Hohlbein et al., 2013). Here, an intermediate FRET species, which appears to be a distinct conformation between the open and closed structures, was found to persist in the presence of mismatched substrates but only transiently exists when the complementary dNTP is present. Interestingly, structures of KlenTaq (Klenow-fragment analog of Taq DNA polymerase) containing an abasic site in the template reveal that the conserved gating tyrosine residue (Fig. 4b) can pair opposite an incoming substrate to allow for primer extension, albeit at significantly reduced rates due to the formation of a sub-optimal enzyme active site (Obeid et al., 2010(Obeid et al., , 2012. More recently, the ajar conformation was witnessed in a ternary structure of KlenTaq bound to the unnatural d5SICS:dNaMTP base pair ( Fig. 4c) . Collectively, these data suggest that the ajar conformation plays a functional role in nucleotide discrimination in which base pair mismatches stabilize an intermediate conformation that is not catalytically active.

Capturing phosphodiester bond formation by time-resolved crystallography
Data acquired from structural studies into the mechanism of DNA synthesis confirm the prediction that all polymerases catalyze the same nucleotide-transfer reaction, which involves the formation of a phosphodiester bond through nucleophilic attack of the 3 ′ -OH group of the primer on the α-phosphate of the incoming nucleoside triphosphate with concomitant displacement of the pyrophosphate leaving group (Steitz et al., 1994). The reaction is pH-dependent and analogous to acid-base catalysis, where the nucleophile (3 ′ -OH) needs to be deprotonated and the leaving group (pyrophosphate) needs to be protonated. It requires twometal ions that stabilize a pentacoordinate transition state in a bimolecular substitution (S N 2) reaction mechanism (Fig. 5). Metal ion A activates the 3 ′ -OH for nucleophilic attack, while metal ion B stabilizes the buildup of negative charge on the pyrophosphate leaving group via coordination to the βand γ-phosphates. The reaction may be further activated through the formation of an intramolecular hydrogen bond between the 3 ′ -hydroxyl and β-phosphate groups of the incoming dNTP substrate (Genna et al., 2016).
Despite the accumulation of significant structural data showing polymerases from all domains of life trapped in various stages of DNA synthesis, the actual step of chemical bond formation has long remained elusive. This problem was elegantly solved when Yang and colleagues applied the technique of timeresolved X-ray crystallography to follow the course of phosphodiester bond formation by human polymerase η (pol η) (Nakamura et al., 2012). In this study, inactive pol η crystals were obtained by crystallizing a ternary complex of the polymerase bound to DNA and dATP in the presence of Ca 2+ ions, a catalytically inactive divalent metal ion. The nucleotide-transfer reaction was then initiated in crystallo by transferring individual crystals first to a wash buffer and then to a reaction buffer containing Mg 2+ ions, which displace the Ca 2+ ions and allow phosphodiester bond formation to proceed. The reaction was stopped at defined times by freezing the crystals in liquid nitrogen for structural analysis. Electron density maps reveal that Mg 2+ ions displace the Ca 2+ ion within the first 40 s, forming the twometal ion complex required for nucleotide transfer. From 40 to 230 s, the structures show a steady increase in the nucleotide addition product, thus capturing the chemical bond forming step (Fig. 6). Transient densities identified the rate limiting step of the reaction as deprotonation of the 3 ′ -OH group, which is accompanied by a change in the sugar pucker conformation of the terminal nucleotide from C2 ′ endo to a C3 ′ endo. Interestingly, a third Mg 2+ ion was found to be essential for DNA synthesis (Gao and Yang, 2016). Similar results have also been observed for time-resolved reactions performed on DNA polymerase β (pol β) (Freudenthal et al., 2013).  6 Ali Nikoomanzar et al.

Promiscuous activities of natural polymerases
Although DNA polymerases are generally thought of as remarkably specific catalysts, many examples now exist where natural DNA polymerases are able to incorporate limited numbers of non-cognate or unnatural substrates into an otherwise natural DNA strand. The catalytic activity and fidelity of these reactions varies significantly depending on the type of chemical modification and the number of chemically modified nucleotides incorporated into the growing strand. In general, natural polymerases are more accepting of base modifications made to the 5-position of pyrimidines and the 7-position of purines than modifications made to the sugar moiety. Reactions of this type are typically performed using polymerases that are either naturally or intentionally deficient in exonuclease activity (exo-), which prevents removal of the modified residue after nucleotide incorporation.
Most of the examples cataloged to date involve the incorporation of one or a small number of modified nucleotides into an otherwise natural DNA strand. However, a few cases are known where the template or extension product is composed entirely of nonnatural nucleotides. The varying degrees of tolerance exhibited by natural polymerases for unnatural substrates have played an important role in elucidating the mechanistic underpinnings behind how polymerases recognize their substrates. These details are not easily discerned from crystal structures obtained for polymerases caught at a specific step in the DNA synthesis cycle. Instead, they require chemical analogs that probe the enzyme active site in ways that are not possible purely with natural substrates. As illustrated in the section below, such experiments demonstrate that: (1) Watson-Crick hydrogen bonding groups can be rearranged or removed altogether, and (2) substrate tolerance varies considerably depending on the type of polymerase and chemical modification. Such information has provided insights into the limits of substrate specificity and identified starting points for evolving new variants with improved activity.
Natural DNA polymerases that function with reverse transcription activity In 1973, Loeb and colleagues were the first to discover the promiscuous activities of natural polymerases by demonstrating that natural RNA templates can be copied into DNA using E. coli DNA polymerase I (Loeb et al., 1973). This activity, commonly known as reverse transcription (RT), makes it possible to synthesize the cDNA products of RNA sequences. In nature, reverse transcription is mediated by reverse transcriptases, a class of polymerases that are responsible for replicating the genomes of RNA viruses (Coffin and Fan, 2016). Nearly two decades later, other laboratories recognized that Taq and Thermus thermophilus (Tth) DNA polymerases have measurable RT activity with Tth exhibiting 100-fold greater activity than Taq (Jones and Foulkes, 1989;Tse and Forget, 1990;Myers and Gelfand, 1991). Although this activity helped establish the first examples of a coupled RT-PCR process for detecting and quantifying cellular RNAs, Tth's requirement for manganese ions results in higher error rates during cDNA synthesis. More recently, Bergquist and coworkers identified polymerases from other thermophilic organisms that exhibit RT-PCR activity under standard magnesium conditions (Shandilya et al., 2004).

Expanding the genetic alphabet with new hydrogen-bonding base pairs
In 1987, Benner and coworkers suggested that the functional activity of nucleic acid catalysts could be improved by incorporating additional chemical diversity into DNA and RNA (Benner et al., 1987). Toward this goal of augmenting nature's genetic alphabet, several non-natural base pairs were envisioned that would allow for novel hydrogen-bonding schemes between the various hydrogen-bond donor and acceptor groups found on the Watson-Crick face of designer nucleobases (Fig. 7). In 1989, Switzer and Benner demonstrated that this concept was physically possible by enzymatically synthesizing natural genetic polymers containing an unnatural iso-guanine:iso-cytosine (iso-G:iso-C) base pair (Switzer et al., 1989). These experiments were performed using Klenow DNA polymerase and T7 RNA polymerase to synthesize DNA and RNA, respectively. Although iso-G was found to suffer from a minor enol tautomer that leads to mispairing opposite T and iso-C was susceptible to deamination, this foundational study paved the way for what would eventually become an artificially expanded genetic information system that includes the four canonical bases found in DNA plus four additional genetic letters that make up the S:B and Z: P base pairs (Piccirilli et al., 1990;Benner, 2004;Hoshika et al., 2019).
Being the first unnatural base pair, iso-G:iso-C was widely studied in a variety of different contexts. Tor and Dervan used N 6 -(6-aminohexyl)isoguanine (6-AH-isoG) to establish a general protocol for site-specifically labeling RNA (Tor and Dervan, 1993). Accordingly, T7 RNA polymerase is used to transcribe RNA molecules that contain the 6-AH-isoG nucleotide at a defined position that is then post-transcriptionally modified by coupling biotin or a fluorescent dye to the primary amino group attached to the iso-G nucleobase. Whereas iso-C was originally reported to be prone to deamination, Horn and colleagues found that the 5-methyl iso-C (iso-C Me ) analog ameliorates this problem (Horn et al., 1995). The ability to chemically synthesize oligonucleotides containing iso-C and iso-G led to thermal and thermodynamic studies on duplexes arranged in both the antiparallel and parallel strand configurations (Roberts et al., 1997a;Seela et al., 1999) as well as the formation of iso-G tetraplex and pentaplex motifs that self-assemble around monovalent cations (Roberts et al., 1997b;Chaput and Switzer, 1999;Kang et al., 2012). Moreover, iso-G:iso-C base pairing has been visualized inside the duplex of DNA crystals (Robinson et al., 1998), evaluated in the context of the hammerhead ribozyme (Ng et al., 1994), shown to replicate nonenzymatically , and found to be a viable substrate for RecA-mediated DNA recombination (Rice et al., 2000). More recently, the iso-G: iso-C Me base pair has been renamed the S:B pair in honor of its inventors, Switzer and Benner (Hoshika et al., 2019).

Expanding the genetic alphabet with hydrophobic base pairs
An alternative approach to generating unnatural base pairs began in 1997 when Kool and colleagues made the surprising discovery that hydrogen bonding is not an absolute requirement for DNA synthesis (Moran et al., 1997b). Steady-state kinetic measurements showed that Klenow DNA polymerase recognizes difluorotoluene (F), a non-hydrogen bonding isostere of thymine ( Fig. 8a), only ∼4-fold less efficiently than natural TTP (Moran et al., 1997a). Additional polymerase studies revealed that selectivity for the insertion of A opposite F rather than C, T, or G was strikingly similar to that of T, making F a strong shape mimic of T (Moran et al., 1997a). Subsequent study on 4-methylbenzimidazole (Z), a nonpolar analog of adenine (Fig. 8a), led to the first demonstration in which a hydrophobic base pair was replicated by a DNA polymerase (Morales and Kool, 1998). This study showed that the unnatural Z:F base pair exhibits strong selectivity against natural nucleotides, with the noted exception of dATP mispairing opposite F in the template. Nevertheless, the ability to replicate hydrophobic base pairs in vitro cultivated the notion that hydrophobicity and shape complementarity contribute to the recognition of DNA substrates (Kool, 2002).
Inspired by the success of the Z:F base pair, Schultz and Romesberg applied a more traditional medicinal chemistry approach to identify an array of nonpolar molecules that are recognized as base pairs by natural DNA polymerases (Ogawa et al., 2000). One of the more successful early examples was 7AI, an indole ring system that is capable of self-pairing (7AI:7AI) in duplex DNA (Fig. 8a) (Tae et al., 2001). Using Klenow DNA polymerase, 7AI showed modest incorporation efficiency (∼200-fold less efficient than natural bases), but high selectivity against natural nucleotides (Tae et al., 2001). However, 7AI is poorly extended after nucleotide incorporation, which limits its utility as an orthogonal third base pair. This problem was partially solved using a second DNA polymerase, mammalian polymerase β (pol β), which allows DNA synthesis to continue from a primer that has been extended with 7AI (Tae et al., 2001).
In 2003, Hirao and colleagues extended the number of nonpolar base pairs that are recognized by DNA polymerases by demonstrating strong shape complementarity between the adenosine analog Q and a new pyrimidine analog pyrrole-2-carbaldehyde (Pa) (Mitsui et al., 2003). The Q:Pa base pair ( Fig. 8a) was designed to be more selective than the original Q:F base pair, which permits modest to high levels of misincorporation opposite A and T nucleotides (Morales and Kool, 1999). Unlike the 7AI:7AI base pair, Klenow DNA polymerase is able to efficiently incorporate and extend the Q:Pa base pair in both sequence contexts with Q or Pa present in the template strand. Mispairing experiments reveal that dATP inserts opposite either Q or Pa, but that the resulting mispair leads to chain termination in the subsequent extension step. However, PaTP is inserted and extended with low efficiency opposite A, indicating that the geometry of the terminal Pa nucleotide is not a complete impediment to further extension. These data indicate that the Q:Pa pair is an improvement over the original F:Q pair in terms of selectivity and extension efficiency but that further engineering would be required to achieve true orthogonality.

Expanding the genetic alphabet with metal-mediated base pairs
Metal-mediated base pairs represent a third approach for expanding the genetic alphabet beyond the four bases found in nature. Metal-mediated base pairs consist of two artificial bases that coordinate a suitable metal ion in the Watson-Crick base pairing region of a natural base pair (Jash and Muller, 2017). Dozens of examples have been described that coordinate metal ions, such as Cu 2+ , Ag 1+ , Hg 2+ , Pd 2+ , and Cd 2+ , in synthetic DNA produced by solid-phase synthesis. Although metal-mediated base pairs have been described in the architectures of several DNA nanostructures (Jash and Muller, 2017), significantly less is known about their recognition properties in the context of DNA replication. One of the more successful examples is the dS-Cu-dS base pair ( Fig. 9), which is fully orthogonal and can be PCR amplified in the presence of the canonical A:T and G:C base pairs (Kaul et al., 2011). However, the requirement for an organic co-factor (ethylene diamine) in addition to the inorganic co-factor (Cu 2+ ) may limit the application of the dS-Cu-dS base pair relative to other pairs that rely on an inorganic co-factor alone (Kim and Switzer, 2013;Kobayashi et al., 2016;Rothlisberger et al., 2017). Despite this minor weakness, the ability to design unnatural base pairs based on metal ion coordination chemistry provides ample room for further development. For example, Shionoya and colleagues recently found that Cu 2+ -mediated artificial base pairing offers a novel approach for controlling the allosteric regulation of catalytic DNA molecules (Nakama et al., 2020). One could imagine applying similar design principles toward the development of metal-responsive materials and logic circuits.

Replicating six-letter genetic alphabets with increased efficiency and fidelity
Early efforts toward the development of orthogonal base pairs led to the realization that many first-generation base pairs suffer from problems that limit their use in practical applications. In some cases, the efficiency of nucleotide incorporation was low when compared to natural bases, while other cases witnessed poor extension kinetics with the polymerase pausing after nucleotide insertion (Hamashima et al., 2018). Another common problem was nucleotide selectivity in the enzyme active site with unnatural bases mispairing to varying degrees with natural bases (Hamashima et al., 2018). To overcome these problems, organic chemistry was used to design new versions of unnatural base pairs that replicate with higher catalytic efficiency and fidelity. Benner and colleagues, for example, developed the Z:P base pair ( Fig. 7), which is more stable than a conventional G:C base pair (Wang et al., 2017). In the context of a six-letter genetic alphabet, the Z:P base pair is sufficiently robust that it can be enzymatically synthesized (Yang et al., 2007), amplified by PCR and sequenced (Yang et al., 2011), transcribed into RNA and reverse transcribed back into DNA (Leal et al., 2015), subjected to iterative rounds of in vitro selection, and used to evolve aptamers, a type of synthetic antibody , that bind to breast and liver cancer cell lines (Sefah et al., 2014;Zhang et al., 2015b). In subsequent study, DNA aptamers containing Z and P were generated with high specificity to mammalian cells overexpressing glypican 3, a known biomarker for liver cancer (Zhang et al., 2016). Similarly, Romesberg and Hirao also developed second generation unnatural base pairs that faithfully replicate using natural DNA polymerases (Malyshev and Romesberg, 2015;Hamashima et al., 2018). For example, the hydrophobic TPT3: NAM base pair (Fig. 8b) generated by Romesberg and coworkers achieves 99.98% selectivity per doubling by PCR using OneTaq DNA polymerase (Li et al., 2014), and the hydrophobic Ds:Px  Quarterly Reviews of Biophysics base pair (Fig. 8b) produced by Hirao and colleagues achieves 99.97% selectivity per doubling by PCR using Deep Vent DNA polymerase (Okamoto et al., 2016). The Ds:Px base pair was used to evolve high affinity DNA aptamers containing five genetic letters (A,C,G,T,Ds) to the protein targets vascular endothelial growth factor 165 (VEGF 165 ), interferon-γ (INFγ), and von Willebrand factor A1 domain (vWF) (Kimoto et al., 2013;Matsunaga et al., 2017). The increased chemical diversity of these libraries led to the production of aptamers with significantly higher affinity for their targets than comparable libraries using only natural bases. In subsequent study, the Ds-containing DNA aptamers were shown to inhibit VEGF 165 and INFγ binding to their cognate cellular receptors (Matsunaga et al., 2015;Kimoto et al., 2016), which advances the use of aptamers as synthetic affinity reagents.
Given the propensity for natural polymerases to replicate unnatural base pairs, structural studies were undertaken to compare the geometry of unnatural base pairs to those found in nature. Three different ternary structures have now been solved with an unnatural base pair occupying the insertion site of a KlenTaq DNA polymerase. The examples (Fig. 10) feature the unnatural base pairs of NaM-5SICS, Ds-Px, and P-Z in which the nucleotides NaM, Ds, and P occupy the templating position and 5SICS, Px, and Z are the incoming substrates, respectively (Betz et al., 2012Singh et al., 2018). The collection of structures shows the artificial base pairs adopting planar geometries that are structurally similar to natural base pairs. Interestingly, a solution structure of duplex DNA containing a NaM-5SICS base pair unconstrained by a DNA polymerase reveals an intercalated structure rather than the more normal coplanar structure with edge-on-edge packing (Malyshev et al., 2010). Similar structures have also been witnessed for other hydrophobic base pairs (Brotschi et al., 2001;Matsuda et al., 2007;Wojciechowski and Leumann, 2011), indicating that the polymerase induces a Watson-Crick geometry required for DNA replication.

Testing hypotheses about polymerase recognition
Beyond the immediate implications of establishing new hydrophobic base pairs, the ability to construct synthetic analogs of natural bases provides a unique opportunity to test hypotheses about how polymerases recognize their substrates (Jung and . In the mid-1990s, some of the first crystal structures of polymerases bound to their substrates were solved to high resolution (Pelletier et al., 1994;Doublie et al., 1998). These structures, which include Bst DNA polymerase (Kiefer et al., 1998), a close structural analog of Klenow DNA polymerase, reveal the presence of hydrogen bonding interactions between polar side chains and hydrogen bond acceptor atoms (N3 of purines and O2 of pyrimidines) found on the minor groove side of A:T and G:C base pairs. The observation of these interactions in the enzyme active site suggested that minor groove hydrogen bonding is an important aspect of DNA substrate recognition. To test this hypothesis, DNA synthesis reactions were performed using hydrophobic bases that either contain or lack minor groove hydrogen-bonding acceptor atoms (Morales and Kool, 1999). The resulting data clearly show that minor groove hydrogen bonding is critical for base pair recognition. Moreover, these interactions are more prevalent at the nucleotide extension step than the nucleotide insertion step and are stronger for the growing primer strand than the templating strand (Morales and Kool, 1999). Interestingly, each of the second-generation unnatural base pairs described above (Z:P, 5SICS:NaM, and Ds:Px, see Fig. 8b) have hydrogen bond acceptor atoms on the minor groove side of the Watson-Crick base pair to facilitate polymerase recognition.

Recognizing chemical modifications made to nucleobase positions
Structure-activity studies indicate that thermophilic DNA polymerases exhibit broad tolerance for chemical modifications made to the C5 position of pyrimidines and the C7 position of 7-deazapurines ( Fig. 11) (Jager and Famulok, 2004;Jager et al., 2005;Hollenstein, 2012;Kielkowski et al., 2014;Cahova et al., 2016). Notable examples include the use of Kod and Vent DNA polymerases to evolve slow off-rate modified aptamers (SOMAmers) from diversity-enhancing libraries containing C5-modified deoxyuridine residues (Vaught et al., 2010;Gawande et al., 2017). This strategy led to the development of an array-based platform for monitoring protein levels in human serum (Ostroff et al., 2010;Williams et al., 2019). Interestingly, the ability to synthesize DNA strands with multiple consecutive modifications uncovered strong substrate preferences between thermophilic A-and B-family DNA polymerases. Famulok and coworkers, for example, found that archaeal B-family DNA polymerases are more accepting of base-modified nucleotides than
thermophilic A-family DNA polymerases (Jager et al., 2005). Sawai and colleagues made similar observations for C5-modified pyrimidines (Kuwahara et al., 2006). Together, these observations suggest that A-and B-family polymerases have different structural constraints in the major groove region of the polymerase active site. Marx and coworkers investigated the substrate specificity of Aand B-family DNA polymerases by solving high resolution crystal structures of KlenTaq and Kod DNA polymerases bound to natural and base-modified substrates (Bergen et al., 2012(Bergen et al., , 2013Kropp et al., 2018;Kropp et al., 2019). The structures indicate that bulky modifications pass through a cavity that extends outside the enzyme active site. This cavity enables members of both polymerase families to incorporate C5-modified pyrimidines and C7-modified purines into the growing DNA strand and to continue DNA synthesis afterward. Consistent with polymerase activity observed by Famulok and Sawai (Jager et al., 2005;Kuwahara et al., 2006), the cavity is larger and more accessible for Kod DNA polymerase than KlenTaq DNA polymerase (Fig. 12). In addition, the structures also show that substrate specificity is impacted by the location of the thumb subdomain. In the case of KlenTaq, the tip of the thumb (residues 506-509) extends into the major groove region of the DNA duplex, whereas the analogous region in Kod (residues 668-675) interacts with the phosphodiester backbone.

Propagation and evolution of an artificial genetic system
In a striking example of enzyme promiscuity, we recently discovered two naturally occurring DNA polymerases that will faithfully replicate 2 ′ -fluoroarabino nucleic acid (FANA) (Wang et al., 2018b), which is an unnatural genetic polymer that contains 2 ′ -fluoroarabino residues in place of natural ribose or deoxyribose nucleotides (Damha et al., 1998). Kinetic measurements collected using polymerase kinetic profiling (PKPro), a technique that monitors nucleotide synthesis using high-resolution fluorescent dyes that intercalate into the growing duplex (Nikoomanzar et al., 2017), reveal that Tgo DNA polymerase catalyzes the synthesis of FANA polymers on DNA templates with a rate of ∼15 nt min −1 , while Bst DNA  polymerase promotes DNA synthesis on FANA templates with a rate of ∼1 nt min −1 (Wang et al., 2018b). The replication process occurs with a mutational rate of ∼8 × 10 −4 and an overall fidelity of 99.9% (Fig. 13a), making it the most faithful replication system for a xeno-nucleic acid (XNA) polymer (Chaput and Herdewijn, 2019).
An obvious application of the FANA replication system is the evolution of XNA aptamers and catalysts with enhanced nuclease resistance for diagnostic and therapeutic applications (Houlihan et al., 2017). Toward this goal, an efficient RNA-cleaving FANA enzyme (FANAzyme, Fig. 13b) was generated that functions at a rate of >10 6 -fold over the uncatalyzed reaction and achieves substrate saturation with Michaelis-Menten kinetics (Fig. 13c) (Wang et al., 2018b). The enzyme comprises a small 25 nt catalytic domain that is flanked by substrate-binding arms that can be engineered to recognize diverse RNA targets. Divalent metal ion, pH profiles, and mass spectrometry analyses indicate that the reaction follows a metal and pH-dependent transesterification mechanism to produce an upstream cleavage product carrying a cyclic 2 ′ ,3 ′ -monophosphate and a downstream strand with a 5 ′ -OH group. In addition to expanding the chemical space of nucleic acid enzymes, this example provides a framework for evolving new types of FANA enzymes that can be generated using commercially available reagents, which is not the case for other XNA systems (Wang et al., 2018b).

Structural insights into Bst DNA polymerase as an XNA reverse transcriptase
Bst DNA polymerase is unusual among naturally occurring replicative DNA polymerases, as it exhibits innate reverse transcriptase activity on nucleic acid templates of diverse chemical composition. Primer-extension studies reveal that Bst will copy templates composed of non-cognate RNA (Shi et al., 2015), and the synthetic congeners of glycerol nucleic acid (Tsai et al., 2007), FANA (Wang et al., 2018b), and threose nucleic acid (TNA) , into full-length DNA products. We obtained the first structural insights into an enzyme with XNA reverse transcriptase activity by solving crystal structures of Bst DNA polymerase that capture the post-translocated product of DNA synthesis on templates composed entirely of FANA and TNA (Fig. 14)  . Comparison of these structures with Bst DNA polymerase bound to the natural DNA primer-template duplex (Chim et al., 2018) reveals differences, particularly at the enzyme active site as well as in protein interactions with the duplexes . The DNA/FANA and DNA/TNA duplexes within the active site adopt distinct conformations from the natural system (Fig. 14a), whereas the number of protein contacts to the phosphodiester backbone increase by 8 and 13, respectively, presumably to better position the template for DNA synthesis. Interestingly, despite strikingly different backbone conformations, both FANA and TNA adopt B-type
helical structures when hybridized to DNA (Fig. 14b). Taken together, these data suggest the importance of structural plasticity as a possible mechanism for XNA-dependent DNA synthesis and offers preliminary rationale for designing variants with improved functional activity. However, it should be stressed that further structural studies are needed to fully understand how gain-offunction mutations are changing the active site conformation of engineering polymerases.

Engineering polymerase functions by rational design
Rational design has been used to discover new polymerase activities without resorting to molecular evolution. Early strategies utilized natural sequence diversity and residue or domain swapping to change the substrate specificity or biological stability of a polymerase. Structure-guided approaches have also been used to predict specific amino acid changes that would lead to a desired activity. Together, these strategies provide insight into the mechanism of DNA synthesis, the functional role of accessory domains, and the potential for new or improved activities to arise from natural sequence variation. The following section illustrates a number of cases where the deletion or transfer of residues between DNA polymerases leads to enhancements in enzyme performance (Table 2). Other cases, however, show the limitations of rational design and the need for more advanced approaches to enzyme engineering.

Structural permutations of natural DNA polymerases
In 1970, Klenow and Henningsen were the first to use limited proteolysis as a way to evaluate the mechanism of a DNA polymerase (Klenow and Henningsen, 1970). Using affinity chromatography to purify DNA polymerase I from crude cellular extracts of E. coli lysate, two distinct polymerase elution profiles emerged with different enzymatic properties and molecular weights. Although both enzymes retained their cognate polymerase and 3 ′ -5 ′ exonuclease proofreading activities, only the larger enzyme (∼150 kDa) exhibited 5 ′ -3 ′ exonuclease activity. Speculating that the 5 ′ -3 ′ exonuclease domain had been removed by proteolytic digestion, the larger DNA polymerase was treated with subtilisin to produce a smaller fragment (∼70 kDa) with the same size and enzymatic properties observed for the smaller polymerase isolated by affinity chromatography. This version of DNA polymerase I, now commonly known as Klenow DNA polymerase, is routinely used to form blunt ended DNA by filling in 5 ′ overhangs and removing 3 ′ overhangs (Sambrook et al., 1989) and for second strand cDNA synthesis after reverse transcription of RNA back into DNA (Gubler, 1987). Klenow DNA polymerase holds a special place among synthetic biologists, as it was the first DNA polymerase used to replicate an unnatural base pair in DNA (Switzer et al., 1989). Following the invention of PCR (Saiki et al., 1988), a significant effort was made to improve the isolation of Taq DNA polymerase by recombinant protein expression in E. coli so that the enzyme could be used as a tool for molecular biology research. In addition to optimizing the promoter sequence (Lawyer et al., 1989;Engelke et al., 1990), researchers sought to increase protein expression levels by truncating the enzyme. In two separate cases, shorter versions of Taq DNA polymerase (94 kDa) were produced by removing segments of the gene encoding the 5 ′ -3 ′ exonuclease domain (Fig. 15). The first example was a 705 bp 5 ′ -truncation that yielded a 67 kDa variant called KlenTaq (67 kDa), which is the Klenow-fragment analog of Taq DNA polymerase (Barnes, 1992). The second example was a truncation that removed the first 867 bp region to yield a 61 kDa derivative known as Stoffel (Lawyer et al., 1993). Although full-length Taq DNA polymerase is widely used in quantitative real-time PCR applications due to its 5 ′ -3 ′ exonuclease activity, the smaller KlenTaq and Stoffel polymerases are often used to amplify DNA containing modified nucleotides  and as starting points for directed evolution (Malyshev et al., 2009;Yamashige et al., 2012). In recent years, KlenTaq has become a favorite polymerase among X-ray crystallographers wishing to capture the structures of DNA polymerases synthesizing non-cognate and synthetic congeners of natural nucleotides (Betz et al., 2012Singh et al., 2018).

Exonuclease silencing
The 3 ′ -5 ′ exonuclease proofreading activity associated with many DNA polymerases is designed to correct single-nucleotide mismatches that occur during the course of normal DNA synthesis. Mutations that silence this activity are often advantageous for synthetic biology applications that require polymerases to incorporate unnatural nucleotides into the growing strand. In the absence of these mutations, modified nucleotides are often difficult to incorporate as the rate of nucleotide addition must compete with the rate of DNA editing. Early attempts at silencing the 3 ′ -5 ′ exonuclease domains led to the surprising discovery that certain exonuclease-silent (exo-) polymerases can function with enhanced activity. Tabor and Richardson, for example, discovered that T7 DNA polymerase (exo-) functions with ∼10-fold higher activity than natural T7 DNA polymerase, which enables the enzyme to read through difficult hairpins (Tabor and Richardson, 1989b). Similar activity silencing mutations led to the production of a Bst DNA polymerase variant that functions with elevated thermal stability (Riggs et al., 1996).

Accelerating DNA synthesis with non-specific DNA-binding domains
Improving the performance of thermophilic DNA polymerases that are capable of PCR amplification was an important early goal in molecular biology. Efforts to study this problem led to the realization that replicative DNA polymerases often use complicated mechanisms that cannot be applied in a general way to in vitro assays. For example, many replicative DNA polymerases rely on accessory proteins, such as thioredoxin (Das and Fujimura, 1979) or ring-shaped protein complexes that make up the 'sliding clamp' (Baker and Bell, 1998), which are highly specific to individual polymerases. One exception is the doublestranded DNA-binding protein Sso7d isolated from Sulfolobus solfataricus, which provides general enzyme enhancing activity when fused to standard DNA polymerases (Wang et al., 2004). Examples where DNA polymerases have been fused to the Sso7d DNA-binding domain include Taq and Stoffel (both A-family members) and Pfu, a hyperthermophilic archaeal B-family DNA polymerase isolated from P. furiosus. Activity assays show a ∼5-20-fold increase in processivity for the three enzymes tested with the greatest increase observed for Stoffel (2.9 nt versus 51 nt per binding event). Importantly, addition of the Sso7d DNA-binding domain to the polymerase did not alter the catalytic properties of the enzyme, which is critical for high-fidelity DNA synthesis. Polymerases engineered with the Sso7d domain reduced the cycle times required for DNA amplification, generated amplicons of increased length, and provided   Ability to PCR amplify fully 2 ′ -OCH 3 modified aptamers to HNE Liu et al. increased tolerance against salt inhibition. Phusion DNA polymerase is an example of a DNA polymerase (Pfu fused to Sso7D) that was engineered for rapid, high-fidelity DNA synthesis of long amplicons. Helix-hairpin-helix (HhH) motifs found in DNA modifying enzymes, including nucleases, ligases, polymerases, and helicases are a second example where a general DNA-binding motif has been used to enhance the activity of a DNA polymerase. In nature, two-thirds of DNA topoisomerase V is comprised of HhH motifs (Slesarev et al., 1993). When these motifs are removed from the enzyme, topoisomerase retains activity but is more sensitive to salt inhibition than the full-length version. Guided by this observation, variants of Stoffel and Pfu DNA polymerases were constructed with HhH motifs fused to their N-and C-terminal regions (Pavlov et al., 2002). The engineered polymerases exhibit increased resistance to inhibition under high salt conditions with more HhH motifs providing greater protection. As an example, an engineered Stoffel polymerase remains active in the presence of 250 mM NaCl, whereas the natural polymerase was found to be completely inactive. One hypothesis drawn from this result is that the lack of enzymatic activity observed under high salt conditions is not due to the presence of monovalent ions interacting with the enzyme active site, but rather an inability to form the complex needed for DNA synthesis.
The addition of helicase to the reaction mixture represents a third approach for improving polymerase activity. Helicase-dependent amplification (HDA) uses the energy from ATP and helicase to produce a single-stranded template that can be copied under ambient conditions (Vincent et al., 2004). As such, HDA has become an attractive technique for point of care diagnostics that require minimal instrumentation. Versions of HDA have been performed using UvrD helicase, Klenow DNA polymerase, MutL, and single-stranded binding protein . Thermophilic versions of this technique do not require accessory proteins but are limited to short amplicons of only 200 bp in length . To produce longer amplicons, a non-covalent system termed 'helimerase' was developed that relies on a coiled-coil motif to synchronize the activities of the helicase and polymerase (Motre et al., 2008). The complex forms in vitro as well as in vivo and can be used to produce amplicons that exceed 1 kb in length.

Determinants of sugar recognition
Natural DNA polymerases are significantly less tolerant toward chemical modifications made to the sugar moiety than the nucleobase. One of the few early reports on sugar recognition is an acyclic peptide nucleic acid derivative that functions as a chain terminator of DNA synthesis (Martinez et al., 1997). However, despite facile preparation, this analog is less efficient compared to 2 ′ ,3 ′ -dideoxyribonucleoside triphosphates, which is the standard reagent set used for Sanger sequencing (Sanger et al., 1977). A slightly different example is C4 ′ -acylated thymidine triphosphates developed to study DNA strand repair (Marx et al., 1997(Marx et al., , 1999. Other prominent examples where sugar modifications have been evaluated in DNA polymerase reactions include the recognition of: 2 ′ ,5 ′ -isomeric DNA by Klenow and HIV RT (Sinha et al., 2004); acyclic nucleotides by Vent (Gardner et al., 2004); glucose nucleotides by Vent (Renders et al., 2009); flexible nucleic acids by Klenow (Heuberger and Switzer, 2008); locked nucleic acid (LNA) by Superscript III (Crouzier et al., 2012); cyclohexynyl nucleic acid (CeNA) by HIV RT and Vent (Kempeneers et al., 2005); hexose nucleic acid (HNA) by Klenow and Taq (Pochet et al., 2003); and TNA by Superscript II and MMLV RT . However, the activity observed with these substrates is significantly less than the wild type activity observed with natural substrates. In some cases, manganese ions are used to loosen the enzyme active site, which is a common technique for increasing the tolerance of a DNA polymerase for unnatural nucleoside triphosphates (Dube and Loeb, 1975;Tabor and Richardson, 1989a). However, as noted previously, supplementing the reaction with manganese ions often leads to higher rates of nucleotide misincorporation.
Tabor and Richardson were among the first to explore the determinants of substrate specificity by a DNA polymerase (Tabor and Richardson, 1995). Recognizing that bacteriophage T7 DNA polymerase incorporates chain terminating ddNTPs into DNA more efficiently than DNA polymerases from E. coli and Taq, polymerases bearing hybrid sequences in the enzyme active site were constructed and tested to determine the molecular basis of substrate specificity. The mutational study uncovered a single hydroxyl group on Tyr526 that was responsible for the observed substrate specificity. Substitution of Tyr526 in T7 DNA polymerase with phenylalanine increases the discrimination against ddNTPs by >2000-fold, while replacing the analogous Phe residue in either E. coli DNA polymerase I or Taq DNA polymerase with Tyr decreases discrimination against ddNTPs up to 8000-fold. Since E. coli DNA polymerase I binds ddTTP and dTTP with equal affinity, the source of discrimination likely occurs at a subsequent step in the catalytic cycle.
Related studies on Vent DNA polymerase (exo-) isolated from Thermococcus litoralis demonstrated that mutating the active site residue Ala488 to a larger side chain increases the incorporation of sugar-modified nucleotides, including ddNTPs, NTPs, and 3 ′ -dNTPs (Cordycepin) (Gardner and Jack, 1999). The pattern of relaxed specificity at this position roughly correlates with the size of the amino acid substitution with larger residues showing a higher tolerance for sugar-modified substrates. Similar effects were observed when the Vent Ala488 mutation was transferred to other archaeal DNA polymerases, including Pfu (exo-) (Evans et al., 2000) and Kod (exo-) (Hoshino et al., 2016). Addition of the Vent A488L mutation to 9°N produced a commercial enzyme known as Therminator polymerase, which found early widespread use as a research tool for DNA sequencing using acyclic nucleotide analogs (Gardner and Jack, 2002).
Since its discovery, Therminator DNA polymerase has become the most widely studied and experimentally utilized engineered polymerase for synthesizing modified nucleotides (Gardner et al., 2019). Derived from a hyperthermophilic euryarcheon Thermococcus sp. 9°N, this B-family polymerase carries an A485L mutation in the O-helix of the finger subdomain along with the 3 ′ -5 ′ exonuclease silencing mutations D141A and E143A. Despite the fact that position 485 faces away from the polymerase active site and does not directly interact with the incoming nucleoside triphosphate, this mutation imparts strong gain-of-function activity for a wide variety of sugar, base, and backbone modified substrates (Bergen et al., 2013;Kropp et al., 2017). This observation is thought to be due to a change in the dynamics between the open and closed state of the fingers, which increases the occupancy of the closed conformation necessary for chemical catalysis. This relatively straightforward mechanism could explain the ability for Therminator to accept a broad range of substrates, including noncognate substrates (NTPs)
Recognizing the importance of the A485L mutation as a critical determinant of substrate specificity, a significant effort has been made to further improve the activity of this mutation through rational design (Gardner et al., 2019). In the case of RNA synthesis, combining the A485L mutation with Y409G and E664K, the steric gate and so-called second steric-gate, respectively, enabled Tgo DNA polymerase to synthesize RNA strands up to 1.7 kb in length . The attachment of two biotinylated 'peptide legs' to Therminator led to a polymerase complex with streptavidin that increased the processivity of DNA synthesis from less than 20 nucleotides to several thousand nucleotides per binding event (Williams et al., 2008). The A485L mutation has also been used to improve XNA synthesis wherein an engineered polymerase named TgoT (V93Q, D141A, E143A, and A485L) provided the backbone for generating new polymerase variants that can synthesize a variety of artificial nucleic acids, including CeNA, ANA, FANA, HNA, TNA, and LNA . In the case of next-generation sequencing (NGS), Therminator was used as the starting point for generating a polymerase that facilitates the synthesis of fluorescently-tagged nucleotides (Gardner et al., 2012). If the past is any indication of the future, it would seem likely that the next generation of engineered polymerases will benefit from further exploration of the Therminator position.
Improving DNA polymerase performance for PCR PCR has had a major impact on molecular biology by providing a simple method for amplifying DNA (Saiki et al., 1988). Early experiments required fresh polymerase to be added during each extension cycle due to the high temperatures >95°C required for denaturing the DNA strands prior to the start of another round of synthesis. This arduous task greatly reduced the speed of amplification, as it not only required the physical presence of a researcher to add new enzyme but also lowered the theoretical limit of DNA replication due to the presence of increasing quantities of inactive enzyme. A solution to this problem came when a thermophilic DNA polymerase was isolated from the bacterium species T. aquaticus (Chien et al., 1976). Taq DNA polymerase was harnessed for its intrinsic thermal stability, which allows for uninterrupted cycles of DNA replication. PCR has since found widespread use in DNA cloning, NGS, criminal forensics, molecular diagnostics, epigenetic mapping, and pathogen detection (Garibyan and Avashia, 2013). However, as the demand for PCR amplification has grown, so has the need for new variants that can function under more demanding conditions. Genotyping biological samples require precise DNA amplification to distinguish single-nucleotide polymorphisms from random mutations. Recognizing that motif C in A-and B-family DNA polymerases may contribute to mismatch extension through indirect H-bonding between the minor groove and a histidine side chain (Franklin et al., 2001), Marx and coworkers applied a structure-guided approach to identify variants of Taq DNA polymerase that function with increased fidelity. An automated fluorescent screen was established to evaluate 1316 variants of Klenow DNA polymerase (exo-) bearing mutations at positions 879-881 (Summerer et al., 2005). Protein expression was conducted in 96-well plates and crude lysate was queried for activity in 384-well format. Fidelity values were assigned based on the ratio of extension from primers containing matched and mismatched 3 ′ -terminal residues. A Klenow variant with LVL at positions 879-881 exhibited strong kinetic discrimination against mismatch extension. Transferring the LVL mutations to analogous positions in wild-type Taq DNA polymerase produced an engineered version of Taq DNA polymerase with increased discrimination against transitions and transversions (Summerer et al., 2005).
Taq DNA polymerase is readily inactivated by hemoglobin and humic acid present in blood and soil samples used for DNA analysis. Surprisingly, Klentaq1, a truncated version of Taq DNA polymerase with a 278 aa N-terminal deletion, can amplify singlecopy genomic DNA in the presence of 5-10% whole blood (Abu Al-Soud and Radstrom, 1998;Abu Al-Soud and Radstrom, 2000). To generate an enzyme with improved activity, Barnes and coworkers screened a library of 40 arbitrary but functional variants with mutations at positions 626 and 706-708 for improved PCR performance under increasing amounts of whole blood (Kermekchiev et al., 2009). The screen revealed that mutation of E708 to K, L, or W resulted in enhanced resistance to various inhibitors, including plasma, hemoglobin, lactoferrin, serum IgG, soil extracts, and humic acid. The resulting polymerase facilitates the amplification of single-copy human genomic targets from whole blood, which eliminates the need for a sample treatment step.
Archaeal B-family DNA polymerases are widely used enzymes for PCR because of their high thermal stability and presence of a strong 3 ′ -5 ′ proofreading exonuclease domain. However, despite high sequence and structural homology, the Thermococcales order of archaeal DNA polymerases exhibits strikingly different kinetic properties that affect their PCR performance. Kod DNA polymerase, for example, possesses higher processivity (defined as the number of dNTP incorporations per binding event) than its related homologs but is 10°C less stable (83 versus 93°C) than Pfu DNA polymerase, which limits its utility as an enzyme for PCR. To improve the processivity of Pfu DNA polymerase, Connolly and coworkers transferred residues from the forkedpoint (polymerase junction between the template-binding and editing cleft consisting of seven arginine residues) and entire thumb regions of Kod DNA polymerase to Pfu (Elshawadfy et al., 2014). The resulting polymerase with the combined forkedpoint and thumb regions from Kod DNA polymerase retained the high thermal stability of Pfu while gaining an increased capacity for PCR performance.
Similar efforts to explore the natural diversity of DNA polymerases were performed by recombining gene fragments of A-family DNA polymerases taken from soil samples of microorganisms found near thermal hot springs (Yamagami et al., 2014). Corresponding regions of the pol gene for Taq DNA polymerase were substituted with the amplified gene fragments and the chimeric variants were tested for activity. Biochemical analysis led to the identification of two mutations, E742R and A743R, that impart higher DNA-binding affinity and faster primer extension activity on Taq DNA polymerase. Both factors resulted in improved PCR performance, suggesting that natural diversity is a promising strategy for finding new amino acid positions with strong gain of function activity.
The ability to sequence epigenetic modifications is an important goal of genomic research. Of all possible epigenetic modifications, none is more prevalent than 5-methylcytosine (5mC). This subtle chemical change has far reaching implications for normal cellular growth and development as well as several neurological diseases and cancer (Allis and Jenuwein, 2016). Bisulfite treatment converts natural cytosine bases to a 5, 6-dihydrouracil 6-sulfonate (dhU6S) intermediate that is subsequently hydrolyzed to deoxyuracil (dU). Because 5mC is resistant to bisulfite treatment, this approach can be used to identify 5mC epigenetic markers by mapping the conversion of bases that are read as dC before and after bisulfite treatment. Unfortunately, this approach leads to significant degradation of the genomic DNA sample, which hampers genome-wide association studies. Holliger and coworkers recently discovered that the engineered polymerase 5D4, previously developed to recognize hydrophobic base analogs, is able to amplify DNA carrying the bisulfite intermediate (Millar et al., 2015). This discovery greatly improves the workflow and sensitivity of 5mC detection in genomic DNA samples.
Mutagenic DNA polymerases that function with low fidelity have value as reagents for creating degenerate libraries for directed evolution studies and offer clues into the mechanistic underpinnings of substrate recognition during DNA synthesis. To investigate this phenomenon, Loeb and coworkers created a library of ∼ 200 000 mutant Taq DNA polymerase variants comprising random mutations in the dNTP binding pocket of motif A (residues 605-617) (Patel et al., 2001). The library was screened for activity using a temperature-sensitive complementation assay in E. coli and a subset of active variants were tested for fidelity using a mismatch primer extension assay. Taq polymerase variants with strong mismatch extension activity each contain substitutions at I614, indicating that a single, highly mutable, active amino acid is critical for DNA polymerase fidelity. A Taq DNA polymerase variant bearing the I614K mutation was shown to function with a 20-fold higher error rate than wild-type Taq DNA polymerase and can bypass damaged and abasic sites in DNA templates. This example provides an approach for producing polymerases that function with error-prone activity during PCR.

Engineering polymerases by directed evolution
In the last 20 years, the field of polymerase engineering has benefited from the growth of new technologies that make it possible to generate custom polymerases by directed evolution. Whether searching designer libraries that carefully sample all possible single-point mutations at defined positions or less sophisticated libraries that contain random mutations at unknown positions, the technologies available today allow users to rapidly search large combinatorial libraries (>10 7 unique members) in timeframes ranging from days to weeks. These efforts have been aided by the development of clever strategies for establishing genotype-phenotype linkages that make it possible to determine the sequence of active variants with valuable gain-of-function mutations. The most common approaches perform the activity step in vitro, which allows for greater control over the reaction conditions and substrate chemistries, including the use of synthetic congeners that bear little or no resemblance to natural nucleotides. In addition to establishing new enzymes with practical applications in biotechnology and medicine, these studies also provide a wealth of information about how polymerases function. As these studies continue, sufficient knowledge may be gained that will enable future generations to one day bypass the need for directed evolution and allow computational methods to predict individual sequences with desired activities. However, realizing these dreams will require a greater understanding of the determinants that govern substrate specificity, which is a major goal of most polymerase-engineering efforts.

Phage display
Phage display is one of the oldest and most successful methods for evolving peptides and proteins with ligand binding activity (Smith and Petrenko, 1997). With this technique, a gene encoding a protein of interest is inserted into a phage coat protein gene, which causes the phage to display the protein on its outside surface while retaining the encoding genetic information inside the bacteriophage. A modified version of phage display was originally developed by Jestin, and subsequently refined by Romesberg, to facilitate the evolution of polymerases with new activities (Jestin et al., 1999;Xia et al., 2002). In this method, phage particles are engineered to display the DNA primer-template duplex and polymerase variant in close proximity. The polymerase library is expressed as an N-terminal fusion of the minor M13 phage coat protein pIII in such a way that the phage surface contains one copy of the polymerase and four copies of a short acidic peptide. Separately, a complementary basic peptide is conjugated to the DNA primer, annealed to a DNA template, and combined with the phage particle to form a coiled-coil linking the DNA primer-template duplex to the phage surface. Activity screens are then performed in-cis by enriching for polymerase variants that can incorporate a biotin-tagged nucleotide into the growing DNA strand, which is used to capture the phage particle on streptavidin-coated beads (Fig. 16). The beads are washed to remove inactive variants and the genes encoding functional polymerases are recovered by eluting the bacteriophage with DNase I. The population of enriched phage particles is then amplified by infecting a fresh E. coli culture. Recently, the technique has been improved by incorporating p-azidophenylalanine into the pIII protein, which allows for an alkynyl-modified primer-template duplex to be conjugated to the phage surface using clickchemistry (Chen et al., 2016). The revised protocol avoids the need to synthesize and purify peptide-DNA conjugates comprising the basic peptide and DNA primer.
Phage display was used by Jestin and coworkers to evolve a population of Taq DNA polymerase variants with thermostable reverse transcriptase activity (Vichier-Guerre et al., 2006). Romesberg and coworkers have used this technique to identify variants of the Stoffel fragment (SF) of Taq DNA polymerase that function with improved activity for ribonucleoside triphosphates (Xia et al., 2002), 2 ′ -methoxy (OCH 3 ) nucleoside triphosphates (Fa et al., 2004), and the unnatural PICS:PICS self-pair (Leconte et al., 2005). Further characterization of the polymerase with 2 ′ OCH 3 activity revealed that this variant (SM19) could also recognize substrates with 2 ′ -fluoro (F), 2 ′ -azido (N 3 ), and 2 ′ -amino (NH 2 ) modifications (Schultz et al., 2015). Using the click-chemistry version of phage display, SM19 was evolved to yield , which is a thermostable polymerase able to PCR amplify DNA containing the 2 ′ -OCH 3 and 2 ′ -F modifications on pyrimidine residues (Chen et al., 2016).
The major benefit of the phage display approach is the ability to detect a single nucleotide incorporation event using biotinylated substrates. Anticipated weaknesses include complications of phage particle assembly, the potential for low multiple turnover activity caused by the in-cis selection strategy, and the possibility for high background due to non-specific binding to the solid
support. In the case of SM4-9, for example, the selection required the screening of 500-1000 individual variants between each of the four rounds of selection (Chen et al., 2016).

Compartmentalized self-replication
In 2001, Holliger and coworkers developed a polymerase evolution strategy called compartmentalized self-replication (CSR) that is based on a simple feedback loop in which a polymerase replicates its own gene by PCR (Ghadessy et al., 2001). With this technique (Fig. 17), a population of E. coli expressing different polymerase variants is encapsulated along with the reaction buffer, dNTPs, and primers into emulsions that are produced by vigorous bulk mixing of aqueous and organic phases. During thermocycling, E. coli lysis occurs, releasing the polymerase and encoding plasmid into the surrounding solution. The emulsion serves as a barrier separating each polymerase extension assay into an individual reaction compartment. If the polymerase is able to amplify its own gene using the gene-specific primers supplied in the aqueous phase, then adaptive gains are made that directly and proportionately translate to an increase in the number of amplicons present that encode the active polymerase variant. Through iterative rounds of selective amplification, active polymerases will outcompete the inactive variants. CSR has proven useful for generating polymerases with enhanced thermostability and increased resistance to a range of blood and other environmental inhibitors that prevent DNA samples from being amplified using natural polymerases (Ghadessy et al., 2001;Baar et al., 2011). Molecular breeding experiments performed on thermophilic polymerases led to the isolation of a chimeric polymerase with an increased ability to amplify DNA from ice-age specimens (d 'Abbadie et al., 2007). CSR has been used to generate polymerases that can recognize a broad range of nucleoside triphosphates, including α-phosphorothioate dNTPs (Ghadessy et al., 2004), dNTPs with hydrophobic base analogs (Loakes et al., 2009), and γ-modified dNTPs for sequencing and kinetic assays (Hansen et al., 2011). More recently, Benner and coworkers used CSR to evolve a polymerase that could amplify DNA with a six-letter genetic alphabet that includes the unnatural base pair P:Z (Laos et al., 2013).
Modified versions of CSR have been developed to reduce the adaptive burden of amplifying the entire polymerase gene (>2 kb). The first modified version, termed short-patch CSR (spCSR), focuses the amplification step on a narrow segment of the polymerase gene, which is then incorporated into the fulllength gene when the plasmid is reconstructed between rounds of selection (Ong et al., 2006). spCSR enables the isolation of Taq DNA polymerase variants with enhanced activity for 2 ′ -modified nucleotides including NTPs (Ong et al., 2006) as well as Pfu variants capable of replacing dCTP with fluorescent Cy3-and Cy5-labeled dCTP substrates in PCR reactions (Ramsay et al., 2010). Ellington and coworkers developed another version of CSR called reverse-transcription CSR (RT-CSR), which enables the screening of up to 10 9 polymerase variants for RT activity (Ellefson et al., 2016). RT-CSR was used to produce a thermostable polymerase that actively proofreads DNA synthesis during RT-PCR. CSR benefits from a strong feedback loop that enables the identification of new polymerase variants that are capable of PCR. However, the PCR reactions take place in polydisperse droplets, which could lead to uneven levels of PCR amplification. CSR is also limited to the range of polymerase functions that promote DNA, or RNA in the case of RT-CSR, templated synthesis. Fig. 16. Phage display. Bacteriophage particles are constructed using a proximity strategy that places the polymerase and DNA primer-template duplex in close proximity on the phage surface. Activity screening leads to the identification of polymerase variants that incorporate a biotin-tagged substrate that is captured on streptavidin-coated beads. Functional variants are recovered by eluting the beads with DNase and amplified by infecting a fresh E. coli culture.

Compartmentalized self-tagging
Efforts to establish engineered polymerases with increased tolerance for challenging substrates with highly modified sugars led to the development of compartmentalized self-tagging (CST) . CST is based on a positive selection loop where a polymerase tags its encoding DNA plasmid with a biotinylated primer that hybridizes to a complementary region of the plasmid (Fig. 18). The initial primer-plasmid complex is a weak affinity interaction that becomes stabilized when the primer is extended by the polymerase. After extension, the primer-plasmid complexes are captured on streptavidin-coated beads, which are washed with mild denaturants to remove the unextended primer-plasmid pairs. Plasmids encoding active library members are then recovered from the beads, PCR amplified, and used to initiate another round of selection and amplification.
CST enabled the discovery of engineered polymerases that could synthesize XNA polymers with backbone structures that are distinct from those found in DNA and RNA . By exploring diverse library repertoires of Tgo DNA polymerase that sampled mutations within a 10 Å shell of the polymerase active site, novel polymerase variants were identified that could copy DNA templates into HNA, CeNA, TNA, FANA, and ANA. In this same study, a statistical coupling analysis was used to identify polymerases that could copy the XNA strands back into DNA. Together, these polymerase pairs demonstrate the capacity for artificial genetic polymers to replicate using engineered polymerases to facilitate the passage of genetic information back and forth between DNA and XNA. CST is widely recognized as a major advance in synthetic genetics, a field which aims to explore the structural and functional properties of XNA by in vitro selection (Joyce, 2012).
The major advantage of CST is that it allows for the evolution of polymerases that can synthesize nucleic acid polymers with diverse sugar-phosphate backbones. However, the range of functions is limited to DNA-templated reactions (i.e. DNA-dependent XNA polymerases), as the selection strategy uses the plasmid DNA as the template for the primer-extension reaction. CST also requires affinity purification on a solid support matrix, which lowers the partitioning efficiency of the selection due to unwanted non-specific binding of DNA to the matrix. Finally, the reliance on a metastable primer-plasmid complex requires fine-tuning of the denaturing conditions to ensure proper separation of the plasmids encoding active and inactive variants.

Droplet-based optical polymerase sorting
To overcome some of the weaknesses of previous in vitro selection technologies, our laboratory established a general strategy for evolving new polymerase functions called droplet-based optical polymerase sorting (DrOPS) (Larsen et al., 2016). DrOPS is a high-throughput approach that combines the ultrafast screening power of microfluidics with the high sensitivity of optical sorting. With this technique, a library of polymerase variants is expressed in E. coli and single cells are encapsulated in microfluidic droplets containing a fluorescent sensor that is responsive to polymerase activity (Fig. 19). As with CSR and CST, the surrounding oil
acts as a barrier preventing the contents of one droplet from mixing with the contents of another droplet. However, unlike CSR and CST, microfluidic devices are used to generate a uniform population of droplets. The latest microfluidic designs are capable of generating 18 μm droplets at a rate of 30 000 per second, which allows for the production of >10 8 droplets in 1 h . Following droplet production, the polymerase and encoding plasmid are released into the droplet by lysing the E. coli with heat. Polymerases that successfully copy the template into fulllength product produce a fluorescent signal by disrupting a donor-quencher pair located at the 5 ′ -end of the template strand. The population of droplets can then either be sorted directly using a custom microfluidic fluorescence-activated droplet sorting (FADS) device or converted to double emulsion droplets that are compatible with a traditional fluorescence-activated cell sorting instrument . Despite being a relatively new technique for polymerase engineering, DrOPS has been used to evolve polymerase variants that can synthesize TNA, an artificial genetic polymer in which the natural ribose sugar found in RNA has been replaced with an unnatural threose sugar (Schöning et al., 2000). In its first demonstration, DrOPS was used to identify a manganese-independent TNA polymerase from a site-saturation library of 8000 unique variants after a single round of high-throughput screening (Larsen et al., 2016). More recently, DrOPS was combined with the protein-engineering approach of deep mutational scanning (Araya and Fowler, 2011), to map the sequence function relationships of a replicative DNA polymerase, Kod, isolated from the thermophilic archeae T. kodakarensis . The resulting enrichment profile provided an unbiased view of the ability of each single-point mutant to synthesize TNA. From a single high-throughput screen, two cases of epistasis were discovered, where double-mutant variants functioned with higher activity than the sum of the contributions from either of the individual mutations. This new polymerase, termed Kod-RS, recognizes TNA substrates with nearly the same efficiency as DNA substrates, suggesting that the mutations are beginning to reshape the enzyme active site. An engineered variant with even greater TNA polymerase activity was discovered by performing deep mutational scanning across the entire polymerase domain (Nikoomanzar et al., 2020).
The DrOPS technique compares favorably with other polymerase-engineering technologies in several important ways. First, it provides enormous control over the composition of the primer, template, and nucleoside triphosphates, which should make it possible to select for any type of polymerase activity (i.e. transcription, reverse transcription, and replication). Second, it relies on physical methods for identifying and sorting individual droplets with active polymerases, which greatly increases the partitioning efficiency of the selection and reduces the occurrence of background DNA contamination relative to bead binding assays. Third, microfluidic approaches provide a more economical approach to library screening by allowing researchers to screen ∼10 8 variants per day using ∼10 6 -fold less sample volume than is typically required for automated screening approaches (Price and Paegel, 2016). The economy of scale is especially important when using unnatural nucleic acid substrates that can only be obtained by chemical synthesis and are not readily available from a commercial supplier. Fig. 18. Compartmentalized self-tagging. E. coli cells expressing different polymerase variants are encapsulated in bulk emulsions. Following E. coli lysis, the polymerase is challenged to extend a biotinylated primer annealed to the plasmid. Active polymerases that extend the primer increase the stability of the primer-plasmid complex. After disruption of the emulsion, the primer-plasmid complexes are captured on streptavidin beads, and plasmids annealed to unextended primers are removed with washing. Plasmids annealed to extended primers are recovered, PCR amplified, and used to initiate another round of selection.

Applications in synthetic biology
The slow but growing availability of engineered polymerases that can synthesize artificial genetic polymers (XNAs) with high efficiency and fidelity has already started to make an impact on applications in synthetic biology, biotechnology, and medicine. The following section summarizes major achievements that have been accredited to the discovery of engineered polymerases. Most notably, these examples focus on the generation of biologically stable versions of synthetic antibodies (aptamers) and catalytic enzymes that are composed entirely of XNA. Such efforts have made it possible to bypass the arduous task of introducing modifications postselection in which medicinal chemists painstakingly modify the backbone structure for improved biological stability while carefully avoiding chemical changes that lead to losses in activity. Since many XNAs are resistant, if not recalcitrant to nuclease digestion, research efforts have focused on establishing methods for the discovery of XNAs with desired functional properties (Culbertson et al., 2016). Mastering the production of these reagents by in vitro selection will lead to a new generation of diagnostic and therapeutic agents for the detection and treatment of human diseases.

Synthetic antibodies
Aptamers are nucleic acid molecules that mimic antibodies by folding into tertiary structures that can bind to a broad range of targets from ions and small molecules to proteins and whole cells . Although some aptamers exist naturally as the binding domain of riboswitches (Doudna and Cech, 2002), most are generated by in vitro selection or SELEX (systematic evolution of ligands by exponential enrichment) (Wilson and Szostak, 1999). Similar to natural selection, in vitro selection is a Darwinian evolution process in which a large population of nucleic acid molecules (typically >10 14 unique sequences) is challenged to bind a target (Joyce, 2004). Molecules that bind to the target are recovered and amplified to generate a new population of molecules that has become enriched in members with the desired activity. The process of selection and amplification is continued until the pool becomes dominated by members that bind the target with high affinity. The ability to amplify individual molecules with desired properties and to optimize their functions by directed evolution is a defining feature that separates nucleic acid molecules from other types of organic compounds that cannot replicate because they lack a genotype-phenotype connection (Szostak, 1992).
Aptamers are often compared to antibodies due to their ability to function with high ligand binding affinity and specificity (Jayasena, 1999). However, unlike antibodies, aptamers do not require animals for their production, thus freeing them from the constraints of cellular biology and allowing for greater flexibility in their evolution under in vitro conditions. Once discovered, aptamers are produced through a chemical process rather than a Fig. 19. Droplet-based optical polymerase sorting. E. coli cells expressing different library members are encapsulated in water-in-oil droplets using a microfluidic device. The droplets are collected and lysed off-chip to release the polymerase and encoding plasmid into the solution. Polymerases that extend the primer to full-length product trigger a fluorescent sensor by disrupting a fluorescent donor-quencher pair. Fluorescent droplets are sorted using a custom FADS device. Recovered DNA is PCR amplified and used to initiate another round of selection.
biological process, which avoids the problem of viral or bacterial contamination and greatly reduces the potential for batch-to-batch variation. Aptamers developed for therapeutic purposes generally exhibit a lower immune response than proteins, and their small size (<30 versus ∼150 kDa) provides access to biological areas that are inaccessible to antibodies (Nimjee et al., 2005;Keefe et al., 2010). Aptamers are able to fold reversibly, which overcomes the cold-chain problem that limits the shelf-life, reproducibility, and performance of antibodies. Therapeutic aptamers can also be deactivated with antisense oligonucleotides that recognize the binding domain, thereby providing a valuable antidote that can alleviate unwanted symptoms (Rusconi et al., 2004). Finally, because aptamers are nucleic acid molecules, they can be seamlessly integrated into sensors, actuators, and other devices that are central to emerging technologies (Cho et al., 2009). Despite the many benefits of aptamers relative to antibodies, aptamers composed of natural DNA and RNA are poor candidates for diagnostic and therapeutic applications, as these molecules are rapidly degraded by nucleases present in biological samples. In one case, an unmodified DNA aptamer developed as an inhibitor of α-thrombin exhibited an in vivo half-life of <2 min when assayed in a primate animal model (Griffin et al., 1993). Overcoming this problem led to the use of chemical modifications that protect the 2 ′ -OH group against nucleases that utilize this position for cleavage of the phosphodiester bond. In particular, substitution of the 2 ′ -OH group with amino (NH 2 ), fluoro (F), and methoxy (OCH 3 ) groups has led to enhanced nuclease stability (Keefe and Cload, 2008). For example, Macugen, the first FDA-approved aptamer, is an RNA sequence in which most of its 2 ′ -OH groups have been replaced with 2 ′ -F and 2 ′ -OCH 3 groups (Ng et al., 2006). However, it is important to note that these modifications are often still prone to nuclease digestion (Cummins et al., 1995;Noronha et al., 2000).
While numerous examples of 2 ′ -modified aptamers have been described in the literature (Keefe and Cload, 2008), most were generated by transcription using T7 RNA polymerase. Recently, Romesberg and colleagues reported the directed evolution of variants of the Stoffel fragment of Taq DNA polymerase that accepts a broad range of 2 ′ -modified substrates (Chen et al., 2016). One variant, SFM4-3, was found to PCR amplify substrates with 2 ′ -F and 2 ′ -azido (N 3 ) groups (Chen et al., 2016). This engineered polymerase was subsequently used to evolve aptamers that bind to human neutrophil elastase (HNE), a serine protease associated with inflammatory diseases, using libraries that are partially substituted with 2 ′ -modified nucleotides (Thirunavukarasu et al., 2017;Shao et al., 2020). In a related study, Romesberg used the engineered polymerases, , to evolve fully 2 ′ -OCH 3 modified aptamers to HNE . Structure-activity assays reveal that the 2 ′ modifications are necessary for aptamer activity.
A fundamentally different approach to nuclease stability involves the use of XNAs, which are artificial genetic polymers in which the ribose and deoxyribose sugars found in RNA and DNA have been replaced with a different sugar moiety (Chaput and Herdewijn, 2019). TNA and HNA are particularly interesting as their backbone structures are recalcitrant to nuclease digestion, making them valuable systems for diagnostic and therapeutic applications (Hendrix et al., 1997;Culbertson et al., 2016). The first XNA aptamers appeared in 2012 with the evolution of TNA sequences that can bind to human α-thrombin and HNA sequences having affinity for the HIV trans-activating response element and hen egg lysozyme Yu et al., 2012). These results were viewed as a milestone in synthetic biology, as they demonstrated that heredity and evolution are no longer limited to DNA and RNA (Joyce, 2012).
The last few years have witnessed tremendous growth in the field of XNA aptamer research, with improved enzymes and selection techniques having given rise to higher quality aptamers whose oligonucleotide sequences derive entirely from building blocks with sugar moieties unrelated to those found in nature. Defestano and colleagues, for example, have developed a FANA aptamer that binds HIV RT with low picomolar affinity (Alves Ferreira-Bravo et al., 2015). Similarly, Herdewijn and colleagues evolved an HNA aptamer to rat vascular endothelial growth factor 164 (VEGF 164 ) that distinguishes VEGF164 from the VEGF 120 isoform (Eremeeva et al., 2019). In addition, new TNA aptamers have been discovered with affinity to the small molecule target ochratoxin A and the proteins thrombin and HIV RT Mei et al., 2018;Rangel et al., 2018). More recently, our lab has developed a DNA-display strategy for evolving XNA aptamers in which each XNA strand is physically linked to its encoding DNA template (Dunn et al., 2020). This strategy is analogous to protein display technologies, such as mRNA display that provide a covalent link between the encoding mRNA and translated protein (Roberts and Szostak, 1997). This approach is generalizable to any XNA system where an XNA polymerase is available to copy DNA templates into XNA. It also avoids the need for an XNA reverse transcriptase, which improves the recovery of functional sequences that are present in low abundance after stringent washing has been performed to remove weaker affinity binders. Using this strategy, a TNA aptamer to HIV RT was produced that rivals the best monoclonal antibodies in terms of binding affinity and thermal stability (Dunn et al., 2020). As these studies continue, it will be interesting to see how effective XNA aptamers will be at disrupting extracellular targets, such as the interaction of the viral spike protein of SARS-CoV-2 with the ACE2 receptor of human lung cells.

Catalysts for RNA modifying reactions
Nucleic acid enzymes provide powerful tools for precision medicine by allowing viral or disease-associated RNAs to be cleaved at specific nucleotide positions. The most widely studied member of this family of enzymes is DNAzyme 10-23 (Santoro and Joyce, 1997), which has been evaluated in phases I and II clinical trials for a variety of diseases ranging from basal cell carcinoma to bronchial asthma (Fokina et al., 2015). 10-23 is a magnesiumdependent enzyme that catalyzes the hydrolysis of a phosphodiester bond at a specific dinucleotide junction in the RNA substrate (Santoro and Joyce, 1998). Unlike other gene-silencing technologies (e.g. antisense, siRNA, and CRISPR), DNAzymes benefit from a mechanism that does not require the recruitment of endogenous enzymes. Instead, Watson-Crick base pairing directs the enzyme to a cleavage site that is cut via an in-line attack by a deprotonated form of the 2 ′ -OH group to produce an upstream cleavage product carrying a 2 ′ ,3 ′ -cyclic phosphate and a downstream strand with a 5 ′ -OH group. The enzyme is made generalizable by designing the substrate binding arms to be complementary to the cleavage site. This property of chemical simplicity, coupled with its ease of synthesis, has allowed 10-23 to become a popular tool for clinical and basic research. Over the years, numerous chemical modifications have been made to protect 10-23 from nuclease digestion and increase its efficacy in vivo Quarterly Reviews of Biophysics (Fokina et al., 2017). These include the introduction of an inverted 3 ′ -3 ′ nucleotide and substitution of the deoxyribose sugar for other sugar moieties (Fokina et al., 2017).
Although 10-23 is known to function with high activity (k cat /K m ∼ 10 9 M −1 min −1 ) under optimized in vitro conditions, biological assays show that its capacity for RNA cleavage activity is greatly diminished in cellular environments where Mg 2+ ions are present in lower abundance (Young et al., 2010). One approach to solving this problem involves developing catalysts that carry functional groups that augment the chemical functionality of natural bases. Perrin and colleagues, for example, have evolved a divalent-metal independent DNAzyme that cleaves RNA with multiple turnover activity using imidazole side chains that mimic the mechanism of RNase A (Wang et al., 2018a). Another approach is to evolve XNA enzymes (XNAzymes) that are able to bind divalent metal ions with higher affinity. In 2015, Holliger and colleagues described the first examples of XNA catalysts with RNA cleavage and ligation activity (Taylor et al., 2015). Using engineered polymerases, XNAzymes were isolated from four different backbone chemistries: FANA, ANA, CeNA, and HNA. However, despite the presence of high concentrations of Mg 2+ ions, the XNAzymes produced from this study functioned with relatively weak activity. More recently, we have discovered a FANAzyme, termed FANAzyme 12-7, by in vitro evolution that functions at a rate enhancement of ∼10 6 -fold over the uncatalyzed reaction and exhibits substrate saturation kinetics typical of most enzymes (Wang et al., 2018b). Remarkably, FANAzyme 12-7 cleaves chimeric DNA substrates (DNA substrates having a riboG residue at the cleavage site) under physiological conditions with an activity rivaling that of known DNAzymes that were intentionally selected to recognize such substrates (Wang et al., 2020).

Future directions
Many synthetic biology applications are currently limited by a lack of polymerases that are available to perform a specific function with optimal activity. However, with the advent of new polymeraseengineering technologies, we anticipate that many of these limitations will likely be overcome in the near future. The following section provides some examples where polymerase-engineering technologies could help drive future innovations in synthetic biology, biotechnology, and medicine. Of course, many other examples are possible, including those that have not yet been envisioned.
Next-generation sequencing. Next-generation DNA-sequencing technologies that follow a sequencing-by-synthesis strategy require a DNA polymerase that can facilitate the incorporation of chemically modified nucleotides (Goodwin et al., 2016). Illumina technology, for example, utilizes dNTP substrates that carry a fluorescent dye and reversible 3 ′ -terminator that are removed following nucleotide incorporation (Chen, 2014). However, the removal chemistry leaves behind a portion of the linker connecting the fluorescent dye to the base, commonly referred to as a scar that can reduce the efficiency of polymerasemediated primer extension. Evolving polymerases that can function with higher activity in the presence of scarred nucleotides may improve NGS technology by allowing for longer read lengths, faster turn-around times, and higher quality reads. A similar case exists for RNA-Seq applications where polymerases continually struggle to read through complex RNA structures. Polymerase evolution could overcome this problem by providing thermophilic reverse transcriptases that function at higher temperatures where larger RNA structures denature into single-stranded form.
Oligonucleotide synthesis. Solid-phase DNA synthesis based on phosphoramidite chemistry has driven major advances in the biological sciences by providing easy access to synthetic oligonucleotides (Caruthers, 1985). Examples where this technology has made a major impact include DNA nanotechnology (Seeman and Sleiman, 2018), digital data archiving (Ceze et al., 2019), genome synthesis (Hutchison et al., 2016), PCR (Saiki et al., 1988), and SELEX (Wilson and Szostak, 1999). However, the process of industrial scale DNA synthesis produces large quantities of hazardous waste that require appropriate disposal mechanisms. Moving to an enzymatic DNA synthesis platform would eliminate or greatly reduce this problem by allowing the reactions to proceed in an aqueous environment. Toward this goal, several groups are working to develop terminal deoxynucleotide transferase (TdT) as a possible paradigm for template-independent enzymatic DNA synthesis (Palluk et al., 2018;Lee et al., 2019). One could imagine that such approaches would benefit from polymerase engineering by providing access to new TdT variants that allow for longer DNA synthesis lengths, higher yields, and access to diverse nucleotide chemistries. The application of this approach to XNA, for example, could provide access to synthetic XNAs that are difficult to generate by conventional solid-phase synthesis due to low nucleotide coupling yields.
Information storage. In an age of ever-increasing data, new mechanisms for data storage are in short supply. One paradigm that has attracted significant attention involves using DNA as a soft material for low energy, high-density information storage of digital data (Ceze et al., 2019). At its maximum, 1 g of DNA can store 455 exabytes of information, which vastly exceeds the largest conventional devices (Church et al., 2012). Information storage occurs in four main steps that involve encoding digital information (e.g. text, pictures, and movies) in DNA, writing the information by massively parallel DNA synthesis, reading the information by NGS analysis, and decoding the information back into digital format. However, because DNA is a naturally occurring molecule that is prone to nuclease digestion, information stored in DNA could be unintentionally lost through an accidental encounter with nucleases present in the environment. XNAs that are recalcitrant to nuclease digestion offer a solution to this problem by providing a biologically stable alternative to DNA (Culbertson et al., 2016). For this to be possible, XNA polymerases will likely need to be optimized for higher fidelity and lower template-sequence bias, which could be achieved by directed evolution.
Engineering bacteria. One recent exciting example relevant to the field of synthetic biology is the generation of mutant E. coli strains derived from engineered bacteria that contain significant (∼40-50%) ribonucleotide content in their genome (Mehta et al., 2018). These systems have the potential to provide new insight into the origin of life by offering a better understanding of the transition from the RNA to the DNA world. Such studies would evaluate the relevance of chimeric DNA-RNA genomes with respect to the processes of replication, transcription, and repair. In this area of research, engineered polymerases will almost certainly be required to produce new mutant E. coli strains that contain >50% RNA content in their genomes. Although a daunting task, establishing an E. coli strain with an entirely RNA-derived genome would herald a new advance in synthetic biology. Other related areas where polymerase engineering could contribute to the development of engineered bacterial
strains involves ongoing efforts to create bacterial cells in which all synthetic biology information is stored in XNA polymers (Schmidt, 2010). This field of study, commonly referred to as xenobiology, relies on the concept of genetic orthogonality in which synthetic biology information is stored in XNA chromosomes that are made to replicate independently from the natural host genome . In this way, the growing field of xenobiology promises to make synthetically engineered organisms safer by establishing a genetic firewall between natural biology and synthetic biology (Herdewijn and Marliere, 2009).

Conclusion
In summary, we provide a comprehensive review of polymerase engineering that travels the path from early exploratory studies to modern enzyme-engineering technologies where variants are sampled with incredible speed and accuracy. Though not explicitly discussed in this review, it should be noted that these endeavors have been supported by equally significant advances in nucleic acid chemistry, which provide access to chemical building blocks with new physical and chemical properties. This combination of nucleic acid chemistry with enzyme engineering will uniquely drive new applications in synthetic biology, medicine, and biotechnology.