NOMENCLATURE
 ^{14,12}R:
absolute measured ratio
 fm:

same as F^{14}C
 dF^{14}C:

uncertainty of an individual measurement or quoted error
 Bottomup:

uncertainty approach represented by dF^{14}C of a measurement
 Topdown (u _{expand}):

longterm repeatability and bias
 bg:

background or blank
 cal:

calibrant, standard or reference material
 σ _{counts,rel}:

relative standard error of the counts $$\left( {1/\sqrt {{N_T}} } \right)$$ . N _{ T } is the total counts
 σ _{counts}:

standard error of the counts in the absolute scale of ^{ 14,12 } R
 σ _{bglong term}:

standard error of the background in the long term
 Δreplicates:

mean absolute deviation of the replicates of the same calibrant
 u _{Rw}:

longterm repeatability, calculated from std. dev. of means of replicates of a calibrant
 Δreplicates u _{Rw}:

repeatability, calculated from std. dev. of Δreplicates of a calibrant
 n:

number of individual replicates for a range of batches or measurement dates
 replicate size:

number of elements of a replicate set
 sample:

(statistics) data set or group of observations of a population
 pseudo u _{Rw} (u _{pRw}):

calculated from std. dev. of individual replicates of a calibrant
 u _{Rw, instrument}:

each type of the above u _{ Rw } contains an instrumentation component as follows u _{ Rw, instrument }, Δreplicates u _{ Rw, instrument } and pseudo u _{ Rw, instrument }
 u _{Rw, graphite}:

each type of the above u _{ Rw } contains a graphitization component
 u _{bias}:

root mean square (RMS) of the biases of the mean of replicates relative to nominal
 u _{pbias} pseudo bias:

RMS of biases of individual replicates relative to the nominal value
 u _{bias, combined}:

every type of u _{ bias } is composed of the instrument and graphite combined components
 ^{14}C sample:

material to be measured for ^{14}C content
 ⟨ · ⟩:

mean
INTRODUCTION
In radiocarbon accelerator mass spectrometry (^{14}C AMS), it has been observed that, most of the time, the quoted error for single measurements is an underestimation when comparing with replications of the same ^{14}C sample (Boaretto et al. Reference Boaretto, Bryant, Carmi, Cook, Gulliksen, Harkness, Heinemeier, McClure, McGee, Naysmith, Possnert, Scott, van der Plicht and van Strydonck2002; Scott et al. Reference Scott, Cook and Naysmith2007). The underlying mechanism of this interesting discrepancy is still unknown. An empirical ^{14}C sampledependent error multiplier has been employed to increase the quoted errors to account for the “dark” uncertainty (AertsBijma et al. Reference AertsBijma, Paul, Dee, Palstra and Meijer2021). The current method of calculating the radiocarbon quoted error is by error propagation of uncertainties calculated from a measurement magazine or batch. This method, from a wider metrological perspective, follows the bottomup approach of uncertainty measurement. The topdown approach is another widely used method in science. Its uncertainty is based on longterm variability of the measurand and usually this uncertainty is larger than the bottomup uncertainty (Thompson et al. Reference Thompson and Ellison2011). It would be beneficial to the radiocarbon field to consider topdown components to obtain more realistic expanded quoted errors.
The bottomup approach, as defined by the JCGM guide for uncertainty measurement, requires the determination of all the possible sources of uncertainty for an individual measurement (JCGM 1995). A measurand equation must be defined that accurately models the measurement by relating the value to be reported with the analytical instrument response, calibration and any other affecting variable as in Eq. (A1) of the supplemental appendix. Eq. (A1) includes the measured ratios (^{ 14,12 } R) of the blank (bg) and the reference material or standard calibrant (cal); in addition, isotopic fractionation correction using the drift of the stable isotope ratio (δ ^{ 13 } C). The bottomup uncertainty, shown in Eq. (1), combines the standard errors of: the counting statistics in ^{ 14,12 } R scale (σ_{counts}), measurement of ^{ 14,12 } R of the blank and calibrant (σ_{bglong term}, σ_{cal}) and measurement of the stable isotope (σ_{δ13C}). The standard error of the counts in the ^{ 14,12 } R scale is calculated by $${\sigma _{counts}} = \; \left\langle {}_{}^{14,12}{R_{sample}}  {}_{}^{14,12}{R_{bg}} \right\rangle \; {\sigma _{counts,\;\;rel}}$$ where ⟨·⟩ is the symbol for average and in Eq. (1), ⟨·⟩ is the average of the passes means. σ _{ counts,rel } is the total counts (N _{ T }) relative Poisson error $$\left( {1/\sqrt {{N_T}} } \right)$$ . Eqs. (A1) and (1) are based on AertsBijma et al. (Reference AertsBijma, Paul, Dee, Palstra and Meijer2021) but the equations have been modified to include the symbol for the mean of means. Although this modification, both equations conserve their original form and an explanation has been included in the supplemental appendix. The equation of the bottomup uncertainty comes from the law of error propagation that combines the partial derivatives of the measurand equation with respect to the different input variables of the measurement.
Bottomup approach: error propagation of uncertainties of measurement variables
The topdown approach combines the random and systematic effects on the reported value. Basically, the systematic effect or bias is determined by measuring secondary standards and comparing with the nominal or consensus value. The random effects are measured by longterm replications. The NIST (Possolo Reference Possolo2015) and the ISO norm for medical and clinical laboratories recommend the topdown approach (International Organization for Standardization 2019; Braga et al. Reference Braga and Panteghini2020). The most popular protocols for applying the topdown approach are the Nordtest (Magnusson et al. Reference Magnusson, Krysell, Sahlin and Näykki2011; Näykki et al. Reference Näykki, Virtanen and Leito2012) and the Eurachem (Ellison Reference Ellison2000). The ISO norm 11352 for water analysis is based on both approaches (International Organization for Standardization 2012). Interlaboratory comparison tests (Scott et al. Reference Scott, Cook and Naysmith2010) and intralaboratory proficiency tests are types of topdown approaches. In many fields of science, it has been observed that the uncertainty of the bottomup approach trend to be smaller than the topdown (Burr et al. Reference Burr, Croft, Favalli, Krieger and Weaver2021) because there are unknown components that are not accounted. The repeatability of pure physical processes is resilient over the long term, but the case is different when chemical complex processes are involved (Thompson et al. Reference Thompson and Ellison2011) e.g., ionization and combustion/reduction for radiocarbon. Systematic errors affect the variability of the reported value because systematic errors, known or not, can change over the long term. Nevertheless, systematic errors are not considered on the calculation of the bottomup quoted error. A more accurate report should include random and systematic effects as recently proposed by a unified theory of measurement errors and uncertainties (Huang Reference Huang2018). In this way, the bottomup and the topdown approaches can be coherent. In radiocarbon studies, some longterm components have been taken into account like longterm repeatability and bias for only modern ^{14}C samples (Miller et al. Reference Miller, Lehman, Wolak, Turnbull, Dunn, Graven, Keeling, Meijer, AertsBijma, Palstra, Smith, Allison, Southon, Xu, Nakazawa, Aoki, Nakamura, Guilderson, LaFranchi, Mukai, Terao, Uchida and Kondo2013; Turnbull et al. Reference Turnbull, Zondervan, Kaiser, Norris, Dahl, Baisden and Lehman2015), blank longterm uncertainty, error multipliers and the error propagation for graphitization and chemical treatment (Scott et al. Reference Scott, Cook and Naysmith2007; Schuur et al. Reference Schuur, Druffel and Trumbore2016). It would be helpful to explore longterm uncertainty concepts which have been extensively applied by dedicated metrological laboratories.
Our hypothesis is to check if by including longterm random and bias effects, it is possible to infer more realistic linearly expanded uncertainties. The calculation has been applied to our graphite data acquired during 7 years using N_{2} stripper and 1 year after changing to He stripper. The quoted errors are compared with the uncertainty inferred from our topdown historical analysis to correct the discrepancy. The analysis was done until the graphitization level. Specific chemical treatment and field sampling levels were not considered.
METHODS
BottomUp Approach for Uncertainty Measurement
A computer script written in the R language (R Development Core Team 2013) was developed to automatically query, process and analyze graphite data from our database. The data was analysed by measurement batch (magazine). Every batch was already preprocessed by the MICADAS software BATS (IonPlus AG, Zurich, Switzerland) which saves the results in the database including the information of rejected cycles and passes. Blanks and calibrants with C mass lower than 0.8 mg were rejected by the R script. The calculations of the weighted mean were based on the calculations of BATS (Wacker et al. Reference Wacker, Christl and Synal2010) and the mean ^{ 14,12 } R was corrected with the δ ^{ 13 } C at every pass (Steier et al. Reference Steier, Dellinger, Kutschera, Priller, Rom and Wild2004). The final calculation consists of a mean of means method that groups the data by passes. Furthermore, the σ_{bglong term} was taken as the blank uncertainty determined by a longterm topdown approach. The other uncertainties for the calibrant and δ ^{ 13 } C were calculated as standard errors. These standard errors were calculated as the standard deviation of the passes means divided by the root of the number of passes (p).
The procedure mentioned above was applied automatically to every standard and blank of each batch combusted and graphitized by our EA (Elementar GmbH, Germany)–AGE (IonPlus) system. The R script was able to query the database batches within a specific range of measurement dates. Therefore, the R script was able to automatically analyze and pile up the results for all the calibrants from all the batches belonging to the dates when we used N_{2} stripper or He stripper.
TopDown Approach for Uncertainty Measurement
The Nordtest is a wellknown and easy to understand protocol. Third party researchers have applied it to ^{13}C determination by NMR (Pironti et al. Reference Pironti, Cucciniello, Camin, Tonon, Motta and Proto2017) and to clinical studies (Cui et al. Reference Cui, Xu, Wang, Ju, Xu and Jing2017). The Nordtest expanded uncertainty (u _{ expand }) is the quadrature sum of the longterm repeatability (u _{ Rw }) and bias (u _{ bias }) components (Eq. 2). Each component can be broken down into instrumentation and graphitization effects as shown in Eq. (3). This approach basically analyzes the variability of the reported F^{14}C (fm) of historical replications. An R script was in charge of querying the reported values for any replicated blank or calibrant within a batch and for any nonreplicated secondary calibrant. Our primary calibrant was Oxa2 (SRM 4990C, NIST) and the secondary calibrants were: Oxa1 (NIST), C5, C2, C7, and C6 (IAEA) (Le Clercq et al. Reference Le Clercq, van der Plicht and Gröning1997). The blank was sodium acetate (SigmaAldrich, No. 71180). After finding the replicates, outliers were rejected by a twosided recursive Grubb’s method in which the data zscore was compared to a threshold value. Our zscore was calculated as: z =(fm–⟨fm⟩)/σ where the difference between the individual value and the mean of the data set is compared with the standard deviation of the data set (σ). Similar as Scott et al. (Reference Scott, Cook and Naysmith2010), our acceptance range was –2 to 2. We used the standard deviation of the data instead of the individual uncertainties or quoted errors because we wanted the topdown results to reflect the scatter only and to be independent of how the quoted error is calculated. The mean of each replicate set ⟨fm⟩ was calculated for all the batches. Then u _{ Rw } was calculated as the standard deviation of the replicates means ⟨fm⟩ for a specific calibrant as shown in Eq. (4) and illustrated in Figure 1.
The Nordtest protocol uses the replicates means ⟨fm⟩ in order to minimize the bias effect on the repeatability parameter u _{ Rw }. Statistically speaking, the replicate sets are statistics samples drawn from a population. The central limit theorem (Evans et al. Reference Evans and Rosenthal2004) tells that the standard deviation of the means of statistics samples can be approximated by the standard deviation of the population divided by the root of the sample size. In this paper, the sample size is the number of elements in the replicate set, usually 2 to 4. Thus, a pseudo u _{ Rw } (Eq. 5) was calculated as the bootstrap standard deviation of n individual replicates which approximates the standard deviation of the population and dividing by the root square of the replicate set mean size. n is defined in Figure 1. The pseudo u _{ Rw } has the advantage of having much more data points than the conventional u _{ Rw }. The conventional u _{ bias } was calculated for secondary calibrants and it is defined as a root mean square of biases of the means as in Eq. (6). The bias is the difference between each ⟨fm⟩ value with its respective nominal value. A pseudo bias (Eq. 7) was defined as the root mean square of the biases of the n individual replicate values for any primary and secondary calibrant, taking in account the mean size of the replicate sets. The conventional u _{ Rw } and u _{ bias } were not calculated for the primary calibrant Oxa2 because ⟨fm⟩ is fixed. Oxa2 can be used for the pseudo parameters because they measure the distribution of the individual fm values, not the ⟨fm⟩ value. The bias was not calculated for the blank because its true nominal value is unknown.
We had to select the right replicate type in order to estimate the instrument (u _{ Rw,inst. }) and graphitization (u _{ Rw,graphite }) uncertainty components. If the starting material (e.g., calibrant) was divided before combustion and the graphitized fractions were analyzed in the same batch then this replicate set was included to infer the combined graphitization+instrument components. If the starting material was divided after graphitization and measured in the same batch then it was used to infer the instrument uncertainty. The graphitization uncertainty was calculated as
For every calibrant material in our database, the n number of individual replicates for the combined uncertainty (n _{ c }) was much smaller than the n number of individual replicates for the instrument uncertainty (n _{ i }). This created a problem at the moment of comparing u _{ Rw }, u _{ bias } and their pseudo values for both replicate types in Eq. (3). The problem was that it is difficult to compare standard deviations and RMS of two data sets of very different n sizes also known as unpaired data sets (Mudelsee et al. Reference Mudelsee and Alkio2007). The bootstrapping technique solved this problem by resampling 1000 times the larger replicate data set (instrumentation) of size n _{ i } by taking random statistics subsamples with replacement of equal size as the smaller data set (n _{ c }) and calculating the statistic of interest (e.g., u _{ Rw } or u _{ bias }). Next, the 1000 values were averaged. The statistic of the smaller data set (instrumentation + graphitization components) was calculated conventionally using its whole data set.
For comparison, u _{ Rw } was also estimated based on the method of duplicates which has been applied to radiocarbon by e.g., AertsBijma et al. (Reference AertsBijma, Paul, Dee, Palstra and Meijer2021). The Δ_{duplicates} is the difference between the reported ^{14}C content of duplicates. Instead, we used the Δreplicates concept calculated as the mean absolute difference (MAD) (Hyslop et al. Reference Hyslop and White2009) because we had many cases of triplicates and quadruplicates. AertsBijma et al. (Reference AertsBijma, Paul, Dee, Palstra and Meijer2021) analyzed the quotient of Δ_{duplicates} to quoted error. The collection of said normalized quotients from many batches leaded to a Gaussian distribution which standard deviation is equal to the error multiplier. However, we worked with the distribution of the absolute Δreplicates values (Thompson et al. Reference Thompson and Howarth1973) which leaded to half Gaussian curves due to the absence of negative Δreplicates. Δreplicates outliers were rejected by a onesided recursive Grubb’s method. The Δreplicates u _{ Rw } was estimated as the zerocentered standard deviation of the Δreplicates, including the replicate size as:
RESULTS
First, a graphical illustration of the replicates and topdown approach of data accumulated during two years is explained. Next, we show how much the longterm repeatability of the topdown (average of Δreplicates u _{ Rw } and pseudo u _{ Rw } values) differs from the bottomup approach also known as quoted error (dF^{14}C population mean) in Figure 3a,b. The discrepancy is corrected by adjusting the σ _{ bglong term } parameter of the bottomup approach using ^{14}C blanks. Then the three types of u _{ Rw } longterm uncertainties and two types of u _{ bias } are calculated for each type of ^{14}C calibrant for their data accumulated during several years using N_{2} or He stripping. The plots of all the u _{ Rw } versus F^{14}C lead to two groups, the instrumentation effect and the instrumentation combined with the graphitization effects. The two groups appear depending on how the calibrant material was treated and processed before measurement. The graphitization component is calculated from the quadrature difference of both groups. Finally, taking advantage of the linear trend of the plots of u _{ Rw } and u _{ bias } versus F^{14}C; the instrumentation, graphitization and bias components are added in quadrature to obtain an expanded uncertainty for the ^{14}C range from blank to Oxa2. This expanded uncertainty is compared with longterm repeatability studies from other laboratories.
Graphical Illustration of the TopDown Approach
The reported F^{14}C values calculated by BATS showed to be nearly identical to the R script calculation. This inspection was done for quality control purposes of our script. The quoted error (dF^{14}C) is calculated with Eq. (1) which is based on the bottomup approach. Eq. (1) is the complete propagation of the uncertainties corresponding to: the counts from the ^{14}C sample, the calibrant ^{ 14,12 } R, the blank ^{ 14,12 } R longterm, and the δ^{13}C of the ^{14}C sample and calibrant. All these uncertainties except for the blank are calculated with the data of a specific batch as standard errors of the passes means. In the other hand, the topdown uncertainty is composed of the longterm repeatability and bias components. Each component can be further broken down into the instrumentation and graphitization components. For the topdown, another R script looked up the database for the F^{14}C of replicates for the measurement dates corresponding to N_{2} and He stripping. Three types of repeatability parameters are calculated: u _{ Rw }, pseudo u _{ Rw } and Δreplicates u _{ Rw }; and two types of bias: u _{ bias } and pseudo u _{ bias }. Each type of u _{ Rw } have instrumentation and graphitization components. Both type of u _{ bias } are calculated with the components combined.
Figure 2 is an illustration of the topdown approach where the longterm standard deviation is used for the pseudo u _{ Rw }. The zerocentred bootstrap standard deviation of the collection of the Δreplicates is used for the Δreplicates u _{ Rw }. The bias is the difference between the mean of each replicate set (thick line) to the nominal value and u _{ bias } is the mean effect of all the individual biases. All these parameters are calculated using the same raw data but applying different equations (Eqs. 4–7). Imagine, for a moment, a hypothetical case of a data with u _{ Rw } equal to Figure 2, but with zero biases on ⟨fm⟩. It will have all the replicates means aligned to the corresponding nominal value. In contrast, the scatter of the biases in the real case (Figure 2) decreases the certainty of the reported values comparing to the hypothetical case. Therefore, an accurate longterm uncertainty should include the quadrature addition of u _{ bias } as in Eq. (2). The primary standard Oxa2 is the only case equal to the described hypothetical case where the biases of ⟨fm⟩ are zero but each individual fm does have a bias. Another observation of the topdown approach is shown with the two sets of replicates indicated with red rectangles. The calculated Δreplicates values for both replicate sets are quite similar. However, their contribution to u _{ Rw } are quite different due to their different scatter around the global mean. We think that the information from the Δreplicates and u _{ Rw } are both important and complementary for the longterm repeatability. The number of selected data points and rejected outliers for the calculation of the pseudo and conventional parameters are shown in Table A1 of the supplemental appendix.
Correction of the Discrepancy between the TopDown and BottomUp Approaches with the Blank
The main goal of this section is to compare and to approximate the average of the bottomup to the average of the instrumentation repeatability using the blank. This need of equality between the bottomup approach and the topdown approach without bias has been already pointed out for metrological labs by Horwitz (Reference Horwitz2003). The bottomup is represented by the distribution mean of the dF^{14}C quoted error. The instrumentation repeatability is represented by the 1σ uncertainty of the F^{14}C and Δreplicates distributions. Initially, the blank longterm uncertainty (σ _{ bglong term } in Eq. 1) is set to zero. Figure 3a shows the distribution of the dF^{14}C values of the population of blanks corresponding to each stripper gas. The dF^{14}C distribution mean, for example, for N_{2} is 1.0 × 10^{–4}. This result will be compared in the next paragraph with Figure 3b. Figure 3b shows a halfgaussian distribution of Δreplicates with 1σ uncertainty of 4.0 × 10^{–4} which divided by the replicate size represents the Δreplicates u _{ Rw }. Figure 3b also shows a distribution of blanks F^{14}C values with a global mean of 0.0031 and its 1σ uncertainty (1.0 × 10^{–3}) factored by the replicate size is the pseudo u _{ Rw }. The F^{14}C distribution is centred to zero for visual purposes, so its scale fits the scale of the Δreplicates distribution. The statistics for the blank and the Oxa2 do not need bootstrapping because their instrumentation and combined components data sets are both similarly large. Figure 3b results tell us that 68% of the time, for N_{2} stripping, the F^{14}C difference between blanks measured in the same batch should be 4.0 × 10^{–4} or lower and the F^{14}C difference between blanks measured in different days or batches should be 1.0 × 10^{–3} or lower. The Δreplicates u _{ Rw } and the pseudo u _{ Rw } are two different ways of calculating the effect of the AMS instrument on the longterm repeatability for the topdown approach.
The mean of the two parameters, Δreplicates u _{ Rw } and the pseudo u _{ Rw }, is 7.0 × 10 ^{ –4 } for N _{ 2 } while for He stripping, the mean is 5.0 × 10 ^{ –4 } (Figure A1). In contrast, the bottomup uncertainty (dF^{14}C) of Figure 3a in average is lower (*1 × 10^{–4} for N_{2} and *8 × 10^{–5} for He). The quadratic difference between the longterm repeatability and the bottomup uncertainty gives an approximate value of the σ _{ bglong term }. Figure 3c shows the new histograms of dF^{14}C after adjusting the σ _{ bglong term } to 7.5 × 10^{–4} for N_{2} and 4.7 × 10^{–4} for He. Now, the new dF^{14}C distribution means (*7.6 × 10 ^{ –4 } for N _{ 2 } and *4.8 × 10 ^{ –4 } for He) approximate to the average u _{ Rw } of the topdown longterm repeatability for the blank. The new dF^{14}C distributions for Oxa2, shown in Figure 3d, can be characterized by the average and their 1σ range. Basically, they cover (2.3–6.0) × 10^{–3} for N_{2} and (2.0–3.5) × 10^{–3} for He with averages of 4.1 × 10^{–3} for N_{2} and 2.9 × 10^{–3} for He. These Oxa2 dF^{14}C averages approximate to the instrument topdown repeatability averages for the respective gases 3.1 × 10^{–3} for N_{2} and 2.3 × 10^{–3} for He as shown in Figure 4(a,b). Thus, Oxa2 practically does not present discrepancy between the topdown and bottom approaches. Actually, the Oxa2 distributions with or without σ _{ bglong term } (data not shown) overlap each other because the σ _{ bglong term } is too small to make a difference in the Oxa2 uncertainty range. In short, the application of the σ _{ bglong term } magnitude is enough to approximate the bottomup and topdown approaches for the blank. This is also true for the Oxa2 at the other side of the radiocarbon spectrum. It seems that the level of discrepancy depends on the ^{14}C content.
We suppose that the difference between the uncertainties for the stripper gases is due to the higher target sputtering conditions for when N_{2} was used as stripper. The historical average passes per batch is 15 for N_{2} and 12 for He. Moreover, the average lowenergy current is 55 μA for N_{2}; and 44 μA for He. With these conditions, the Oxa2 targets registered in average 7.83 ± 1.80 × 10^{5} and 7.04 ± 1.42 × 10^{5} total counts per batch for N_{2} and He respectively. The blank registered 2.5 ± 1.0 × 10^{3} and 1.8 ± 0.7 × 10^{3} counts per batch for N_{2} and He respectively. This data tells that in order to fulfil our Oxa2 counting goal of ∼7 × 10^{5}, the targets (blanks and Oxa2) needed to be sputtered during longer time and at higher sputtering intensity for the N_{2} stripper due to the difference in transmission efficiency with He. The more the target is sputtered, the more is the scatter of the ^{ 14,12 } R and the δ^{13}C throughout the batch analysis due to the physical change of the target. This beam distortion at the source is further amplified by the N_{2} angular straggling which is higher than the He straggling at their respective areal densities (SchulzeKönig et al. Reference SchulzeKönig, Seiler, Suter, Wacker and Synal2011). Probably, this ^{ 14,12 } R withinbatch scatter causes the interbatch scatter, increasing the longterm scatter for N_{2} relative to He. The blank F^{14}C repeatability seems to be very sensible to the stripper gas (Figure 3c). In contrast, the Oxa2 uncertainty is not very sensible to the stripper gas. For the Oxa2 distributions in Figure 3d, an overlapping area of 64% was calculated from their normalized density distributions using the package “overlapping” from the R program (Pastore Reference Pastore2018). Therefore, there is some degree of separation (36%) which means that the Oxa2 should have, 36% of the time, lower uncertainty for He stripping than for N_{2}.
Reassessment of the Overall Uncertainty for 7 Years of Data Using Nitrogen or Helium Stripping
Figure 4 shows the result summary of the several types of longterm repeatability (u _{ Rw }) and bias (u _{ bias }) parameters. Pseudo u _{ Rw } and Δreplicates u _{ Rw } are basically calculated from the bootstrap standard deviation of the distributions shown in Figure 3(b) and A1 factored by the root of the replicate size. u _{ bias } is similar but using the root mean square. It is not possible to obtain the histogram for every data point in Figure 4 as the number of individual points is not always high. Table A1 shows that there are data points composed of as lows as 3 to 4 individual points. However, the error in calculating u _{ Rw } and u _{ bias } is the same as calculating the standard deviation of 3–4 points which is not uncommon in science. Figure 4 includes the instrumentation component and graphitization+instrumentation combined components of each of the three types of longterm repeatability u _{ Rw }. The instrumentation component is the effect of the AMS instrument alone. The graphitization includes the effect of combustion and reduction reactions. It is not necessary to break down the bias, thus it is calculated only for the graphitization+instrumentation combined effects. The idea is that the graphitization component can be inferred by subtracting the instrumentation component from the combined components. As it was explained for Figure 3, the bottomup uncertainty was approximated to the instrument longterm repeatability component by adjusting the σ _{ bglong term } parameter. This equality is not exact as it is an average but at least the order of magnitude is correct. The bottomup uncertainty, usually referred as the quoted error, changes depending on individual measurements conditions. Nevertheless, the method is useful to infer the trend of the graphitization component.
The first objective of this section is to calculate the total expanded uncertainty as the quadrature sum of the instrumentation u _{ Rw }, plus the topdown graphitization u _{ Rw } plus the bias of the combined components as shown in Eq. (3). Figure 4(a,b) shows that the instrumentation and combined components can be expressed as linear fittings. Therefore, after subtracting both components using Eq. (8), the linear fittings for the graphitization u _{ Rw } component are: y = 2.4 × 10^{–3} x + 7.0 × 10^{–4} for N_{2} and y = 1.6 × 10^{–3} x + 6.0 × 10^{–4} for He. The graphitization equations are inferred not algebraically but by subtracting the combined and instrumentation components for each F^{14}C value as illustrated in Figure 4(a,b) with an arrow at 0.9 F^{14}C. Then, the linear fitting for graphitization is carried out. The subtraction yields the same result using relative or absolute values because the denominator of the relative forms are the same at each F^{14}C data point. The graphitization uncertainty ranges corresponding to the radiocarbon range from blank to Oxa2 are (0.7 to 3.9) × 10^{–3} for N_{2} and (0.6–2.8) × 10^{–3} for He. The graphitization involves oxidation, reduction and handling which also adds uncertainty in the form of contamination and losses. The longterm repeatability for He stripping is considerably lower than for N_{2}. The instrumentation component depends on the stability of the instrument and tuning; but also includes the ^{14}C inhomogeneous concentration in the solid graphite target. After adding the bias component to the graphitization, the new linear fittings are
In absolute F^{14}C units, the graphitization+bias uncertainty ranges are (0.7 to 4.1) × 10^{–3} for N_{2} and (0.7–3.0) × 10^{–3} for He depending on the standard F^{14}C. Then, in order to calculate the total expanded uncertainty, the bottomup uncertainty (quoted error) can be added to the graphitization+bias. The total expanded uncertainty and its components apportionment are converted into radiocarbon age uncertainty as shown in Figure 4(d) and Figure A2 by using $$u\left[ {yrs} \right] = 8033u\left[ {{F^{14}}C} \right]/fm$$ .
The results of Figure 4 have some scatter because this work was not actually designed as a longterm study; but rather we used the available replicates in the database. We have some measurement batches dedicated to replicates; however, the carbon mass, total ^{14}C counts and beam currents were not dedicatedly controlled. Therefore, the results reflect our routine longterm output of processing standards with diverse characteristics. The measurement of the longterm combined components is weak because the data was only available for the blank and Oxa2. Figure 4 shows that the results of the pseudo parameters are close to the conventional u _{ Rw } and conventional bias. Thus, we think it is acceptable to include the pseudo parameters. The number of selected data points and rejected outliers for the calculation of the pseudo and conventional parameters are shown in Table A1 of the supplemental appendix. The yaxis scales in Figure 4 indicate that the magnitude of the bias component is just slightly smaller than the longterm repeatability. Therefore, the bias should not be ignored. Usually, the bias is considered to not increase the uncertainty of the measurement because it is a constant systematic error. However, it must be included if the bias randomly variates over time. The novelty of this work for radiocarbon is the application of a protocol that allows the calculation and apportionment of the bias as a variable separated from the repeatability; and the addition of both components as indicated by the topdown protocol. The use of the mean F^{14}C (⟨fm⟩) of the replicates eliminates the bias effect on the repeatability calculation and eliminates random effects on the bias calculation as stated in the discussion and conclusions of Näykki et al. (Reference Näykki, Virtanen and Leito2012).
Comparison with Other Laboratories
If we define the error multiplier as the ratio between the total expanded uncertainty to the instrumentation uncertainty which approximates the quoted error then the ranges of the multiplier values are: 1.5–1.7 for N_{2} and 1.8–1.6 for He in the range of blank to Oxa2. The reason for the high error multiplier for He is the similar magnitude of the bias relative to the instrumentation. Without including the bias, the error multiplier range is 1.4–1.2 for He which is in agreement with AertsBijma et al. (Reference AertsBijma, Paul, Dee, Palstra and Meijer2021).
In Figure 5 and Table A2, we compare our results with other laboratories to assess the realism of our additive uncertainty expansion. Although the individual bottomup quoted error of the measurements should be used for the expansion, we use the linear fit of the topdown instrumentation u _{ Rw }. This component is added to the graphitization u _{ Rw } and to the bias to obtain the longterm expanded uncertainty. Table A2a compares our expanded uncertainties with Tables 2 and 3 in the report from the Alfred Wegener Institute (AWI) on longterm standard deviation of calibrants since approximately 2018 (Mollenhauer et al. Reference Mollenhauer, Grotheer, Gentz, Bonk and Hefter2021). Table A2b is the comparison with the Table 3 in the report from the Centre for Isotope Research (CIO) on longterm factored expanded uncertainties for data obtained during 18 months since 2017 (AertsBijma et al. Reference AertsBijma, Paul, Dee, Palstra and Meijer2021). We refer to factored expanded uncertainty to the direct calculation of the error multiplier, in this case 1.4 for the graphitization component, as opposed to the linear additive expansion. Table A2c is the comparison with Tables 2 and 3 in the Chronos CarbonCycle Facility (CHRO) report on longterm standard deviation of calibrants since approximately 2019 (Turney et al. Reference Turney, BecerraValdivia, Sookdeo, Thomas, Palmer, Haines, Cadd, Wacker, Baker, Andersen, Jacobsen, Meredith, Chinu, Bollhalder and Marjo2021). Table A2d is the comparison with Table 1.6 for laboratories #5 and #8 in the FIRI report (Scott Reference Scott2003). Our work is about intralaboratory repeatability thus we selected intralaboratory results from FIRI. The data from laboratories #5, #8 fit well our results. We are using the fMC and F^{14}C concepts interchangeably.
We think that the reasons of the good fit of our expanded uncertainty with the repeatability of other laboratories are the advancement in AMS technologies and the efforts to uniform ^{14}C sample graphitization (elemental analyzer). Chemical treatments effects were not taken in consideration in this paper. Other laboratories could implement the expansion by quadratically adding the graphitization+bias combined effect of Eq. (9) to their quoted error depending on the measured F^{14}C.
As our expanded uncertainties come from the quadrature addition and subtraction of linear fittings in Figure 4(a–c), and each fitting has a confidence interval; thus, by quadratic sum of the confidence intervals, it is possible to assign a distribution range to the expanded uncertainties shown in Table A2 and in Figure 5 as error bars. Our expanded uncertainty is truncated for the fossil range (x∼0) at the value of 0.8 × 10^{–3} due to the constant effect of the intercepts. Uncertainty versus concentration plots that include an intercept have been observed by many researchers in diverse areas of metrology and science (JiménezChacón et al. Reference JiménezChacón and AlvarezPrieto2009); and it is documented in the EURACHEM guide (Ellison Reference Ellison2000). In general, our results are in agreement with the results of other laboratories considering the very different circumstances and calculation methods. Our proposed method can close the discrepancy between the bottomup and topdown approaches; therefore the expanded uncertainties are realistic.
CONCLUSIONS
A topdown protocol has been utilized to apportion the uncertainty into instrumentation u _{ Rw }, graphitization u _{ Rw } and bias components. For realistic purposes, the bottomup approach (quoted error) is approximated to the instrumentation u _{ Rw }. Finally, the components are additively combined to obtain a more realistic expanded uncertainty. Therefore, in future, the individual quoted error can be expanded by adding the graphitization u _{ Rw } and bias depending on the F^{14}C. In absolute F^{14}C units, the graphitization+bias uncertainty ranges are (0.7 to 4.1) × 10^{–3} for N_{2} and (0.7–3.0) × 10^{–3} for He corresponding to the range from blank to Oxa2.
The σ _{ bglong term } parameter allows to equate the bottomup and topdown approaches for the blank. σ _{ bglong term } is too small to change the Oxa2 bottomup uncertainty; nevertheless, Oxa2 does not present discrepancy. It seems that the level of discrepancy depends on the ^{14}C content.
The longterm repeatability of our AMS is much lower when using helium stripping than for nitrogen stripping for the blank and probably for other ^{14}C samples with low ^{14}C content. This demonstrate, from the repeatability point of view, that He stripping is better than N_{2}.
The novelty of this work is the application of a protocol that allows the calculation and apportionment of the bias as a variable separated from the repeatability; and the addition of both components as stated by the topdown approach.
Our expanded uncertainties are in agreement with the repeatability of other laboratories considering the very different calculation methods. However, our expanded absolute uncertainty becomes truncated for fossil ^{14}C samples. The error multipliers inferred from our expanded uncertainty also agree with previous studies.
Acknowledgments
We gratefully acknowledge the funding of the Berne University Research Foundation for the implementation of helium stripping for our MICADAS.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/RDC.2021.96