Internal reference genes with the potential for normalizing quantitative PCR results for oral fluid specimens

Abstract In basic research, testing of oral fluid specimens by real-time quantitative polymerase chain reaction (qPCR) has been used to evaluate changes in gene expression levels following experimental treatments. In diagnostic medicine, qPCR has been used to detect DNA/RNA transcripts indicative of bacterial or viral infections. Normalization of qPCR using endogenous and exogenous reference genes is a well-established strategy for ensuring result comparability by controlling sample-to-sample variation introduced during sampling, storage, and qPCR testing. In this review, the majority of recent publications in human (n = 136) and veterinary (n = 179) medicine did not describe the use of internal reference genes in qPCRs for oral fluid specimens (52.9% animal studies; 57.0% human studies). However, the use of endogenous reference genes has not been fully explored or validated for oral fluid specimens. The lack of valid internal reference genes inherent to the oral fluid matrix will continue to hamper the reliability, reproducibility, and generalizability of oral fluid qPCR assays until this issue is addressed.

The terms 'saliva' and 'oral fluid' are often used interchangeably to refer to fluid samples collected from the oral cavity (Kintz et al., 2000;Wong, 2006). More accurately, saliva is the fluid produced by salivary glands whereas oral fluid is a composite of saliva, serum transudate, mucosal cells and cellular debris, microorganisms, digestive enzymes, and food residues (Schramm et al., 1993;Crouch, 2005;Cone and Huestis, 2007). This review will use the term 'oral fluid' as defined by Atkinson et al. (1993): 'The fluid obtained by insertion of absorptive collectors into the mouth'.
Although various sampling strategies are used for human beings, oral fluid samples in veterinary medicine are usually collected by introducing an absorbent material into the oral cavity (Palmer et al., 2001;Shin et al., 2004;Cavalcante et al., 2018). Depending on the size of animals, oral fluid samples could be collected by allowing large animals and primates to chew on absorbent material, e.g. cotton rope, or swabbing oral and buccal cavities in small animals (Larghi et al., 1975;Thomas et al., 1995;Lutz et al., 2000;Shin et al., 2004;Smith et al., 2004;Gomes-Keller et al., 2006;Prickett et al., 2008;Dietze et al., 2018;Cheng et al., 2020).
The presence of viable viral pathogens, pathogen-specific antibody, and nucleic acids in oral fluids has been well-described (Sirisinha and Charupatana, 1970;Garrett, 1975;Archibald et al., 1986). In people, the presence of infectious viruses in oral fluid was first demonstrated by bioassay, e.g. clinical signs in cats and monkeys inoculated with oral fluids from humans with mumps (Wollstein, 1918;Johnson and Goodpasture, 1934;Henle et al., 1948). Later, it was used to confirm rabies infection in an infant by intracerebral inoculation of Swiss mouse pups with oral fluids from the child (Duffy et al., 1947). The fact that several viruses including cytomegalovirus, human immunodeficiency virus (HIV) (Groopman et al., 1984), herpesviruses (Kaufman et al., 1967;Douglas and Couch, 1970), Zika virus (Bonaldo et al., 2016), and influenza virus (Vinagre et al., 2003), added additional evidence to the role of oral fluids as a source of pathogens. In animals, Coxsackie b-1 virus from rabbits (Madonia et al., 1966), rabies virus from dogs (Larghi et al., 1975), foot-and-mouth disease virus (FMDV) from cattle (Sellers et al., 1968), and influenza A virus and porcine reproductive and respiratory syndrome virus (PRRSV) from pigs (Wills et al., 1997;Detmer et al., 2011) can be isolated from oral fluid specimens.

Statement of the problem
In both basic research and diagnostic medicine, the repeatability of quantitative polymerase chain reaction (qPCR) testing is affected by the variation introduced at any point between sample collection and the final test report (Heid et al., 1996;Klein, 2002;Hoorfar et al., 2004). Ideally, proper controls can be used to verify the integrity of the process accounting for variation. Internal controls that were extracted or amplified concurrently with test samples verify that the procedure was performed correctly and functioned within expected parameters. In addition, external positive amplification controls (template control) containing fixed quantities of purified PCR target nucleic acids may be used to identify run-to-run variation, e.g. concentration of reagents, qPCR profiles, instrument settings. In contrast, external negative amplification controls (non-template controls) are used to detect reagent contamination.
Internal controls are nucleic acids that are either inherent to the specimen matrix (endogenous reference genes) or added ('spiked') into test samples (exogenous reference genes) prior to nucleic acid extraction. Importantly, qPCR results can be 'normalized' in the context of internal control results to compensate for variation arising from the initial sample nucleic acid quantity and/or concentration, differences among reverse transcription and amplification efficiencies, assay protocols, and/or instrument settings (Vandesompele et al., 2002;Bustin and Nolan, 2004;Huggett et al., 2005;Bustin et al., 2009;Biassoni and Raso, 2014). A number of qPCR normalization-compatible internal reference genes have been described for diagnostic matrices in human medicine, e.g. reticulocytes, keratinocytes, oral fluids, bronchoalveolar lavage fluids, tissue samples (Glare et al., 2002;Silver et al., 2006;Bar et al., 2009;Chervoneva et al., 2010;Koppelkamm et al., 2010;Martin, 2016). In contrast, the use of internal reference genes is less frequently reported in veterinary research, perhaps because of the diversity of specimens and animal species (McIntosh et al., 2009;Pol et al., 2013;Yan et al., 2020). Therefore, the objective of this review is to compare the use of internal reference genes reported in recent human and veterinary qPCR research involving the oral fluid matrix.

Inherent variations in real-time PCR
Although real-time PCR has been used to precisely quantify molecular substances, the data should be interpreted with caution because of the introduction of variations throughout the process. PCR results are typically reported as quantitation cycles (C q ), i.e. the number of cycles required for the cumulative fluorescent intensity to meet a pre-determined threshold (Schmittgen and Livak, 2008;Rao et al., 2013). In general terms, samples with a higher initial concentration of target DNA/RNA will require fewer PCR amplification cycles to reach the threshold than those with a lower initial concentration (Schmittgen and Livak, 2008). However, in the laboratory, the C q of any given sample may be affected by extraneous factors, e.g. technicians' proficiency, test protocols, reagents, PCR conditions, and instruments (Johnson et al., 2013;Kralik and Ricchi, 2017). For example, a recent study concluded the process of collecting nasopharyngeal swabs was a significant source of variability and could produce false-negative results in a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) real-time PCR assay (Basso et al., 2020). To address the problem of variability introduced by extraneous factors, results can be expressed as the DNA/RNA copy number in the sample (absolute quantification) or expressed as the difference in target DNA/RNA (relative quantification) relative to known negative samples (Klein, 2002;Schmittgen and Livak, 2008;Kralik and Ricchi, 2017).

Real-time PCR quantification
Absolute quantification converts a C q result to DNA/RNA copy number using either digital qPCR or absolute standard curves. Digital qPCR is done by partitioning a sample into subsamples and then performing qPCR separately on each subsample. Thereafter, the distribution and proportion of subsamples containing molecules of interest are used to estimate the number of DNA/RNA copies based on the Poisson distribution (Dube et al., 2008;Huggett et al., 2013). Alternatively, absolute quantification based on standard curves uses the relationship between the sample C q and known concentrations of DNA/RNA to interpolate the concentration of target in the sample. Absolute standard curves are typically established by generating C q results of serially diluted standards with known copy numbers of target DNA/RNA. However, identifying the change of targets in unknown samples relative to negative calibrators may be sufficient for disease surveillance and diagnostic medicine (Livak and Schmittgen, 2001). Relative quantification of qPCR data may be achieved through two approaches: the relative standard curve and the comparative C q (Liu and Saint, 2002). Relative standard curves use methods similar to absolute standard curves except the standards do not have known DNA/RNA copy numbers. Instead, relative standard curves describe the relationship between C q values and the mass of total DNA/RNA for each dilution. The sample C q result can then be interpreted in the context of the relative standard curve. Because both absolute and relative quantification require that standard curves for targets and references be generated in each PCR run to account for run-to-run variation, a comparative C q method (ΔΔC q , pronounced 'double delta C q '), has been used in gene expression studies (Livak and Schmittgen, 2001;Pfaffl, 2001). This method quantifies the expression of a target gene in a treated sample relative to an untreated calibrator in terms of the fold change in gene expression (Rao et al., 2013). Conveniently, the treated sample and untreated calibrator can be collected at different time points, may be derived from different tissues, or obtained from individuals in different treatment groups (Rao et al., 2013). Unlike standard curve methods, the comparative C q method eliminates the need to generate standard curves in each PCR run and, therefore, may be used in high-throughput molecular laboratories performing routine disease diagnostic and surveillance testing.

Real-time PCR data normalization
Data normalization is a statistical procedure designed to control variations introduced in the sampling/testing process and to ensure that results are comparable within and between laboratories (Bylesjö et al., 2009;Biassoni and Raso, 2014;Filzmoser and Walczak, 2014). For example, Dahdouh et al. (2020) contended that direct estimation of SARS-CoV-2 viral load based on raw C qs could neglect variation introduced during the sample collection process, e.g. patient tolerance to nasal swabbing, and concluded that the normalization of raw C qs against marker nucleic acid genes inherent to sampled cell masses or mucosal surfaces should be implemented to ensure the comparability of coronavirus disease 2019 (COVID-19) qPCRs (Dahdouh et al., 2020;Walsh et al., 2020).
Three methods commonly used for PCR normalization include consistently testing the same amount (mass) of sample, measuring total RNA/DNA, or using endogenous/exogenous reference genes (Huggett et al., 2005):
(1) Testing the same amount of sample is standard practice in molecular and diagnostic laboratories that use standardized protocols, albeit the concentration of detectable target in clinical samples is still affected by sample collection, storage, and handling and, therefore, may not fully represent the initial concentration.
(2) Normalizing qPCR results against the total RNA/DNA content in sample extracts, i.e. prior to PCR, is a more precise approach for controlling sample-to-sample variation . Quantification of total RNA/DNA can be achieved by spectrophotometrically measuring the optical absorbance (OD 260 ) or the fluorescence of dyes that are randomly bound to nucleic acids of the extracted sample (Jones et al., 1998;Green and Sambrook, 2018). However, using total DNA/RNA for data normalization assumes that the efficiency of reverse-transcription and PCR amplification is identical for each sample, i.e. does not take sample-to-sample variation into account (Bustin, 2002). (3) The most common approach for qPCR data normalization is to express the C q of target DNA/RNA in the context of the C q of one or more reference genes (Wittwer et al., 1997). To serve this purpose, reference genes must have genetic sequences that differ from the target and be present at predictable concentrations in the sample (Vandesompele et al., 2002;Huggett et al., 2005;Bylesjö et al., 2009;Guenin et al., 2009). Pfaffl (2001) proposed an approach that integrated data normalization and qPCR relative quantification using test sample and negative calibrator results (Equation 1). This method calculates the target-to-reference ratio (R) of the C q difference between a sample and a calibrator (ΔC q ) while taking PCR amplification efficiencies for target (E target ) and reference (E ref ) sequences into account (Pfaffl, 2001): In gene expression studies, samples collected from individuals with no treatment, or prior to treatment, may be used as negative calibrators and/or as a baseline relative to the expression/detection level of target genes in samples from treated individuals. Therefore, the relative quantity of a target gene in a treated sample is expressed as the fold change relative to an untreated calibrator, using a reference gene as a normalizer (Rao et al., 2013). Both exogenous and endogenous reference genes have been used for data normalization at the individual sample level (Ke et al., 2000). Exogenous reference genes are artificially synthesized nucleic acids with genetic sequences distinct from the target's (Huggett et al., 2005). These heterologous genes may be spiked into test samples prior to the DNA/RNA extraction procedure at a fixed copy number or concentration (Yan et al., 2020) to monitor the efficiency of DNA/RNA extraction and the integrity of reverse transcription and PCR amplification in test samples (Guenin et al., 2009;Johnston et al., 2012). In contrast, endogenous reference genes are host-specific nucleic acids inherent to the specimen (Yan et al., 2020). Since endogenous reference genes are processed concurrently with target DNA/RNA, the detection of these genes reflects both the sample-to-sample variation in the quantity and quality of initial amplifiable DNA/RNA and the variation introduced by the extraction and amplification procedures (Radonic et al., 2004).

Internal reference genes in oral fluids
Endogenous reference genes have been widely used in gene expression analyses for the purpose of representing sample nucleic acid concentration and as the gold standard for qPCR data normalization (Vandesompele et al., 2002;Bustin et al., 2005;Huggett et al., 2005). However, the expression of common reference genes depends on a variety of factors, e.g. cell/specimen types, sample quality and handling, age of subjects, animal species, and disease/treatment status (Zhong and Simons, 1999;Hamalainen et al., 2001;Selvey et al., 2001;Deindl et al., 2002;Glare et al., 2002). Thus, endogenous reference genes must be validated for their consistency of expression and/or detection in test specimens and under the conditions in which target genes will be evaluated (Mestdagh et al., 2009). Typically, this involves comparing the variation in endogenous gene C qs in samples from subjects with potentially impactful biological characteristics, e.g. age, gender, and disease status (Huggett et al., 2005;Robinson et al., 2007;Becker et al., 2010).
In this review, qPCR-based gene expression and disease diagnostic studies were evaluated for the use of endogenous and/or exogenous reference standards in oral fluid specimens from nonhuman vertebrate and human subjects. Initially, the MEDLINE® database was searched (title and abstract fields) on 24 October 2020 for refereed scientific publications containing the following searching terms: ('saliva*' or 'oral fluid*' or 'oral swab*') and ('qpcr*' or 'quantitative pcr*' or 'real time pcr*' or 'real-time pcr*' or 'realtime pcr' or 'RT-qPCR' or 'qRT-PCR' or 'real time RT-PCR' or 'real-time RT-PCR' or 'realtime RT-PCR') not (review[Publication Type]). Articles were excluded if not written in English, if not applicable to non-human vertebrate animals, if the oral fluid specimen was not collected by insertion of an absorptive collector into the mouth (Atkinson et al., 1993), or if only components of oral fluids, e.g. microorganisms, biofilms, salivary extracellular vesicles, were evaluated. The remaining publications were evaluated for the use of internal endogenous and/ or exogenous reference genes. A total of 1566 articles were retrieved from MEDLINE®. For the period 2003-2020, 136 met the language, research subject, and full-text criteria (Table 1). Among these, exogenous reference genes were used in 25.7% (35/136), endogenous reference genes in 27.2% (37/136), and 52.9% (72/136) did not include sufficient information on the use of internal reference genes.
A similar strategy was used to retrieve oral fluid-based qPCR studies on human subjects from the MEDLINE® database for the articles published between 2016 and 2020. Among the 772 articles retrieved, 184 met the language, species, and content criteria (Table 1).
Exogenous reference genes were used in 14.0% of reviewed studies (25/179), endogenous reference genes in 31.8% (57/179), and 57.0% (102/179) of the studies did not include sufficient information on the use of internal reference genes. As shown in Table 1, β-actin (ACTB) mRNA, ribosomal RNAs (18S and 28S rRNA), and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) mRNA, respectively, were the most frequently used endogenous reference genes in published studies on non-human vertebrates. In human studies, ACTB mRNA, GAPDH mRNA, and U6 small nuclear RNA (snRNA) were the most commonly reported.

Ribosomal RNAs
In mammalian cells, gene expression begins by transcribing DNA into single-stranded messenger RNA (mRNA) in the cell nucleus. From the nucleus, mRNA migrates to the cytoplasm where it is paired with complementary amino acids by ribosomes to build proteins (Sergiev et al., 2018). Ribosomes compose two subunits, the small subunit containing the 18S ribosomal RNA (rRNA) and ribosomal proteins, and the large subunit containing 5S, 5.8S, 28S rRNA, and ribosomal proteins (Lafontaine and Tollervey, 2001). Since ribosomal genetic material is highly conserved and nearly universal in cell-rich specimens, e.g. cell culture, peripheral blood mononuclear cells, and tissue samples, rRNA is one of the most commonly used internal reference genes for qPCR normalization in gene-expression research (Kozera and Rapacz, 2013;Ban et al., 2014). Because the 18S and 28S rRNAs are cleaved from the same single-stranded RNA transcript, the 28S:18S rRNA ratio has been used as an index of the integrity and quality of extracted RNA for electrophoresis-based PCR (Schroeder et al., 2006;Becker et al., 2010). For example, De Ketelaere et al. (2006) and Zhao et al. (2016) reported 18S rRNA as one of the most consistently expressed genes in bovine polymorphonuclear leukocytes and peripheral blood mononuclear cells (De Ketelaere et al., 2006;Zhao et al., 2016). Zhong and Simons (1999) concluded that the expression level of 28S rRNA was more consistent in hypoxiacultured cells than ACTB mRNA, GAPDH mRNA, and cyclophilin mRNA (Zhong and Simons, 1999;Wang and Heitman, 2005).
However, the use of rRNAs as internal reference genes in qPCR has several shortcomings. First, their quantity and concentration can vary within specimens from the same species (Ingerslev et al., 2006;Rekawiecki et al., 2013). Second, ribosomes are absent from red blood cells and rRNA detection can be inconsistent in specimens in which blood is a significant component. Finally, rRNAs may be overabundant in cell-rich specimens, e.g. peripheral blood mononuclear cells, tissue, and laboratorycultured cells (Tong et al., 2009). As a consequence, when reversetranscribed and/or amplified simultaneously with the target, they may compete with the target for PCR components, e.g. polymerase, magnesium ions, and dNTP. Furthermore, an overabundance of rRNA increases the risk of cross-contamination during sample handling and testing (Yan et al., 2020).

ACTB mRNA
β-Actin, encoded by ACTB mRNA, is an isoform of non-muscle actin protein that primarily serves as a component of the cytoskeleton of eukaryotic cells (Bunnell et al., 2011). ACTB has been used for sample quality assessment and qPCR normalization because of its ubiquitous expression in cells (Hunter and Garrels, 1977;Biederman et al., 2004;Johansson et al., 2007;Robinson et al., 2007;Ruan and Lai, 2007;Bar et al., 2009;Die et al., 2017), but recent studies have found the expression level of ACTB to vary by animal species, cell and/or specimen type, sample storage time, growth stage, medical treatment, and disease state (Gutala and Reddy, 2004;Nishimura et al., 2008;Spalenza et al., 2011;Panahi et al., 2016;Khanna et al., 2017;Alshehhi and Haddrill, 2019). For example, in human beings, lower expression of ACTB was reported in bronchoalveolar lavage fluid cells and airway endobronchial biopsy samples from asthmatic patients versus clinically normal subjects or subjects treated with inhaled corticosteroids (Glare et al., 2002). Hamalainen et al. (2001) reported up to 11-fold down-regulation of ACTB expression in T-cells over a 14-day course of T-cell differentiation (Hamalainen et al., 2001). In a qPCR reference gene validation study using peripheral blood mononuclear cells and whole blood from healthy and tuberculosispositive subjects, ACTB showed >30-fold variability in both specimens and was determined to be unsuitable for data normalization (Dheda et al., 2004).
In veterinary studies, the expression of ACTB depends on a number of factors and its use as an internal reference standard requires assessment on a case-by-case basis. Stable expression of ACTB has been reported in feline tissue samples and bovine peripheral blood mononuclear cells (Ingerslev et al., 2006;Robinson   150 Ting-Yu Cheng et al. et al., 2007;Kessler et al., 2009;Jursza et al., 2014), but the expression of ACTB has been reported as low in bovine polymorphonuclear leukocytes (De Ketelaere et al., 2006). In a study evaluating the expression of 11 housekeeping genes in canine tissue specimens, including bone marrow, various enteric tissues, heart, muscle, pancreas, and spleen, ACTB was found to be the least consistently expressed (Peters et al., 2007). Thus, although ACTB has been widely used for data normalization in qPCR studies, care should be taken to validate its consistency of expression in the target species and specimen.

GAPDH mRNA
Encoded by GAPDH mRNA, GAPDH is a cytoplasmic enzyme that facilitates glycolysis, a metabolic pathway to release energy, by converting glyceraldehyde-3-phosphate to 1,3-biphosphoglycerate (Tristan et al., 2011;Nicholls et al., 2012;Alfarouk et al., 2014). The ubiquitous expression of GAPDH mRNA in living cells has led to its common use as an endogenous reference control for qPCR normalization in gene expression and disease diagnostic studies (Rebouças et al., 2013). However, like rRNAs and ACTB mRNAs, the expression of GAPDH mRNA may vary among subjects and treatments. Consistent GAPDH mRNA expression has been reported in oral fluid specimens from premature human neonates, human cervical tissues, and neonatal cardiac ventricular myocytes (Winer et al., 1999;Shen et al., 2010;Maron et al., 2012). However, the inconsistent expression of GAPDH mRNA has been reported under a number of experimental conditions, e.g. growing collateral arteries of rabbits, asthmatic human subjects with/without corticosteroid treatment, cells cultured under hypoxic conditions, and whole blood from tuberculosis patients (Zhong and Simons, 1999;Deindl et al., 2002;Glare et al., 2002;Dheda et al., 2004). Barber et al. (2005) reported up to a 15-fold difference in the expression level of GAPDH mRNA across 72 human tissues (Barber et al., 2005). Therefore, GAPDH mRNA may not be the appropriate endogenous reference control for the comparison of qPCR results across specimen matrices.

U6 snRNA
After DNA transcription, RNA transcripts undergo modification to become functional mRNAs able to perform protein synthesis (Moore and Proudfoot, 2009). This pre-mRNA processing involves (1) removing introns from pre-mRNAs (splicing); (2) adding a modified guanine nucleotide at the 5 ′ ends (5 ′ capping); and (3) adding a long chain of adenine nucleotides at the 3 ′ end (3 ′ poly-A tailing). In mammalian cells, U1, U2, U4, U5, and U6 snRNAs complex with RNA-binding proteins to form small nuclear ribonucleoproteins able to perform the splicing activity required to functionalize mRNA (Maniatis and Reed, 1987;Brow and Guthrie, 1988;Stefl et al., 2005). Among five snRNAs, U6 snRNA was the most conserved in size, sequence, and structure across yeast, bean, fly, and mammalian cells (Brow and Guthrie, 1988). Because of its small size (∼100 nucleotides), U6 snRNA has been used to research the expression of micro RNAs, a group of small single-stranded RNAs known for silencing and interfering with mRNA expressions in plants, animals, and viruses (Bushati and Cohen, 2007;Mase et al., 2017;Didychuk et al., 2018). For human samples, U6 snRNA has been used as an internal reference gene for the study of micro-RNA expression in human urinary sediment and serum samples from colorectal adenoma, colorectal adenocarcinoma, and healthy human subjects (Zheng et al., 2013;Duan et al., 2018). However, as observed in other endogenous reference genes, the expression level of U6 snRNA varies among specific specimens and treatments. For example, variation in U6 snRNA expression has been reported in 13 normal and 5 tumorous tissues including colon, esophagus, lung, lymphoid, and prostate (Peltier and Latham, 2008). Lou et al. (2015) evaluated the expression of U6 snRNA in normal and carcinomatous tissues and showed higher levels of U6 snRNA in carcinoma tissues of human breast, liver, and intrahepatic bile ducts compared to normal adjacent tissues (Lou et al., 2015). Therefore, the constancy of U6 snRNA expression should be ascertained prior to implementing its use as an endogenous reference control for qPCR normalization.

Exogenous reference genes
The use of exogenous mRNAs or DNAs added ('spiked') to specimens is well-described for qPCR normalization (Johnston et al., 2012). Exogenous genes are often artificially synthesized and simultaneously detected by primers and probes distinct from those designed for the target genes. Unlike endogenous reference genes, they reflect variation in nucleic acid extraction and qPCR amplification procedures, but not sample collection and handling. For diagnostic qPCRs, exogenous reference genes provide the advantage of consistency, i.e. to avoid the variation reported for endogenous genes, and, therefore, may be a more reliable normalizer than endogenous genes. However, their use in gene expression research is limited because they do not provide a baseline for the comparison of treated and untreated subjects. Among animal qPCR publications reviewed, internal positive controls included in commercial qPCR assays were the most frequently used while heterologous genes, e.g. algal and enhanced green fluorescent genes, were described as well (Hoffmann et al., 2006;Henderson et al., 2013).

Use of endogenous and exogenous reference genes in routine oral fluid diagnostics
Exploration of the diagnostic use of PCR technologies for the detection of pathogen-specific nucleic acids in human oral fluids began in the 1990s (Mandel, 1993;Streckfus and Bigler, 2002) and early successes included Epstein-Barr virus, human herpesvirus type 6, HIV, human cytomegalovirus, and human papillomavirus (Goto et al., 1991;Saito et al., 1991;Garweg et al., 1993;Tominaga et al., 1996). This developmental work led to PCR testing of oral fluid samples for the surveillance of human papillomavirus, HIV, measles, and others (Johnson et al., 1988;Frerichs et al., 1992;Ramsay et al., 1997;Ahn et al., 2014). More recently, SARS-CoV-2 has been detected in oral fluids, suggesting that oral fluid could facilitate the efficient surveillance of the ongoing worldwide coronavirus pandemic (COVID-19) (Azzi et al., 2020;Pasomsub et al., 2020;. As for human beings, PCR technology has been applied to the detection of viral pathogens in animal oral fluid specimens, including feline herpesvirus 1 in oral swabs from experimentally inoculated cats (Reubel et al., 1993), canine distemper virus in dogs (Shin et al., 2004), Borna disease virus in rodents (Sierra-Honigmann et al., 1993), FMDV in sheep (Callens et al., 1998), and PRRSV in swine (Wills et al., 1997). As in human diagnostic medicine, PCR testing has been used in oral fluidbased surveillance and herd-level detection of various swine viral diseases, e.g. porcine circovirus type 2, PRRSV, porcine epidemic diarrhea virus, influenza A virus (Ramirez et al., 2012;Bjustrom-Kraft et al., 2018), and others .
Several fundamental concerns arise when considering the routine use of endogenous reference genes in oral fluid specimens. First, oral fluid is not a cell-rich specimen and the quantity/concentration of target genes, e.g. viral DNA/RNA, may not be biologically associated with the concentration of endogenous reference genes, as it would in specimens with cellular context (Nybo, 2012). For that reason, endogenous reference genes commonly used with cell-rich specimens may not be valid for qPCR normalization in oral fluids. Second, the quality of oral fluid specimens can be affected by sample collection methods. Rogers et al. (2007) reported that oral fluid specimens collected via spitting or oral rinse resulted in a higher concentration and quality of DNA compared to oral brush and swab samples (Rogers et al., 2007). Third, few studies have evaluated the expression of common endogenous reference genes in oral fluid specimens.
The ideal endogenous reference gene for the normalization of diagnostic qPCRs would be abundant and consistent across specimen types, stable in diagnostic specimens over time, and independent from the effect of the pathogen (or the treatment) on the host (Thellin et al., 1999;Dheda et al., 2004;Radonic et al., 2004;Mestdagh et al., 2009;Chervoneva et al., 2010). Such a reference gene has not been identified (Peltier and Latham, 2008); however, other genes inherent to oral fluid specimens merit consideration.
Ubiquitous in epithelial tissues throughout the body, mucins are a family of high molecular weight glycoproteins that are used to protect and lubricate mucosal surfaces (Gendler and Spicer, 1995;Debailleul et al., 1998;Moniaux et al., 2001). The 21 types of mucin identified to date may be divided into gelforming mucins, soluble mucins, and transmembrane mucins (Kumar et al., 2017). Among these, MUC1, MUC4, MUC5B, MUC7, and MUC19 are secreted by salivary glands (Nielsen et al., 1997;Thornton et al., 1999;Sengupta et al., 2001;Alos et al., 2005;Linden et al., 2008), with MUC5B and MUC7, the two major mucins in saliva, constituting ∼20% of the total salivary protein (Takehara et al., 2013). Data are lacking at present, but future research should determine whether mRNAs that transcribe critical mucin domains might serve as endogenous reference standards for oral fluid specimens (Debailleul et al., 1998).
As an alternative to a single endogenous reference gene, normalizing qPCR data against the geometric mean of multiple endogenous reference genes has been used in gene expression research. As opposed to using a single reference gene, this strategy lowers the risk of introducing additional variation into research data (Vandesompele et al., 2002;Bustin et al., 2009). For example, in a study comparing the mRNA levels of eight common endogenous reference genes in oral fluid specimens between healthy (n = 9) and autistic (n = 9) males (∼4 years of age), the most consistent detection was determined in GAPDH mRNA, but the combination of GAPDH and YWHAZ (tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein and zeta polypeptide) mRNAs provided the best qPCR normalization (Panahi et al., 2016). Regardless, data normalization using multiple endogenous reference genes is impractical in highthroughput testing laboratories performing diagnostic qPCRs. For that reason, exogenously synthesized genetic sequences spiked into oral fluid samples have been utilized to monitor the DNA/ RNA extraction and qPCR testing processes (Howson et al., 2018;Weiser et al., 2018;Nagel et al., 2020;Nagura-Ikeda et al., 2020). Although they cannot reflect sample quality, exogenous reference genes can be used for qPCR normalization to provide consistent comparisons across clinical samples (Johnston et al., 2012;O'Connell et al., 2017).

Conclusion
Endogenous and exogenous reference genes are used in gene-expression studies to control for variation inherent in the qPCR testing process and achieve qPCR normalization using welldescribed mathematical approaches, e.g. the ΔC q method proposed by Pfaffl (2001). Although qPCR normalization is recommended to ensure the comparability of results, the majority of oral fluid-based qPCR publications evaluated for this review (52.9% animal studies; 57.0% human studies) did not describe the use of internal controls ( Table 1). As oral fluid-based PCRs become more widely implemented in human and veterinary diagnostic settings, this shortcoming should be addressed through the routine use of validated endogenous and/or exogenous reference genes in qPCR testing. The problems inherent with the use of endogenous reference genes include variation in the concentration of endogenous reference genes introduced by specimen matrices, sample quality and handling, subject age, animal species, and/or disease status (Bustin, 2002;Glare et al., 2002;Bustin and Nolan, 2004;Silver et al., 2006;Nishimura et al., 2008;Kozera and Rapacz, 2013). One possible solution is to normalize qPCR data using two or more validated endogenous reference genes (Vandesompele et al., 2002), but in the high-throughput diagnostic setting, a more efficient and practical approach would be spiking samples with a universally synthesized exogenous gene. Notably, this approach does not control for sample quality (Kavlick, 2018). Finally, because of their robust and consistent expressions in oral fluids, specific mucin genes should be evaluated for the potential to serve as endogenous reference genes for qPCR normalization.