IntCal20 Tree Rings: An Archaeological Swot Analysis

ABSTRACT We undertook a strengths, weaknesses, opportunities, and threats (SWOT) analysis of Northern Hemisphere tree-ring datasets included in IntCal20 in order to evaluate their strategic fit with the demands of archaeological users. Case studies on wiggle-matching single tree rings from timbers in historic buildings and Bayesian modeling of series of results on archaeological samples from Neolithic long barrows in central-southern England exemplify the archaeological implications that arise when using IntCal20. The SWOT analysis provides an opportunity to think strategically about future radiocarbon (14C) calibration so as to maximize the utility of 14C dating in archaeology and safeguard its reputation in the discipline.


INTRODUCTION
Archaeologists are making increasing demands of radiocarbon ( 14 C) calibration. When the first internationally agreed high-precision 14 C calibration curve was issued Stuiver and Pearson 1986), the median quoted error of 14 C ages obtained by archaeologists was ca. 80 BP (Bayliss 1998: figure 11.9) and calibration of single samples was almost all that was attempted (e.g. Pearson 1987). Over the intervening decades the advent of accelerator mass spectrometry (AMS) means that archaeologists have been able to adopt more rigorous sampling protocols and obtain an increased number of age determinations for their sites (Ashmore 1999), quoted errors on 14 C measurements from archaeological samples have steadily reduced (e.g. Bayliss 2016: figure 1), and formal chronological modeling of series of radiocarbon dates has become common practice (Bayliss 2015: figure 1).
The SWOT framework (strengths, weaknesses, opportunities, and threats analysis) is a straightforward strategic planning technique that is of value in focusing attention on key issues which affect our ability to achieve stated goals (Sarsby 2016). In the case of the IntCal20 calibration curve, our aim is to produce robust and accurate chronologies that are of sufficient precision to address our archaeological questions, such as the timings of construction and use of a series of Neolithic long barrows in central-southern England. Calendrical accuracy is of particular importance in the portion of the curve which is based on tree rings as it is in this period that 14 C-based chronologies for archaeological remains are interpreted within historical frameworks and those based on dendrochronology. A series of new measurements obtained on single tree rings from a post that had previously been dated by ring-width dendrochronology from Lancaster Castle, UK, allow us to examine the precision and accuracy of wigglematching (Christen and Litton 1995) timbers from historic buildings.

Data Coverage, Density, and Resolution
IntCal20 includes 9211 measurements on tree-ring samples whose calendar dates are known by dendrochronology, and a further 1498 measurements on samples from trees that have been tied to this sequence by radiocarbon wiggle-matching. Once intra-laboratory replicates have been combined (see below), there are 7946 results on known-age tree-ring samples and 1299 on wiggle-matched samples  in this issue). The comparable totals for IntCal13 were 3527 results on known-age samples and 232 results on wiggle-matched samples. This more than doubling of the quantity of data included in the new calibration curve is its first strength.
The known-age tree rings in IntCal20 run from 0 cal BP (AD 1950) 1 to 12,308 cal BP (10,359 BC), and the wiggle-matched tree rings run from 12,293 ± 4 cal BP (10,344 ± 4 BC) 2 to 14,194 ± 4 cal BP (12,245 ± 4 BC) 3,4 . These results are not spread evenly across this time period, however (Figure 1), with the density of data varying from an average of 2.7 measurements per year in the most recent millennium to only 0.2 measurements per year in the millennia from 9000-10,999 cal BP . Three quarters of the data are concentrated in five of the fourteen millennia covered by the tree-ring-based part of IntCal20 (i.e. the most recent four millennia, and 12,000-12,999 cal BP [10,050 BC]). In the 8000 years between 4000 and 12,000 cal BP (2051-10,051 BC), IntCal20 includes just 303 new measurements that were not included in IntCal13. This variability in the density of data coverage is a weakness of the new calibration curve.
Over half of the tree-ring samples included in IntCal20 comprised single growth-rings ( Figure 2), with a further 18% being blocks containing 2-5 rings. Single-year measurements are available for 2731 years (ca. 19% of the tree-ring-based part of IntCal20). Almost everywhere where high-resolution data are available, additional structure in the calibration curve has been revealed (for example, Figure 3). On the basis of the three 21-year blocks selected for the recent annual tree-ring inter-comparison, this additional structure can be expected usually to lie within, or very close to, the existing IntCal 1σ uncertainty envelope, although this may not always be the case  in this issue: figure 1). The proportion of decadal and bi-decadal blocks in IntCal20 has reduced to 20% and 6% of the dataset respectively, but these are still the only data available for the majority of the treering timescale.
For archaeologists, striving to create prehistories on a generational timescale, these data can be worryingly sparse. For example, for two generations in the later part of the 53rd century BC (5261-5210 BC; 7210-7160 cal BP), a gap in the decadal dataset from Seattle (QL, set 1, division 14 in the IntCal20 database 5 [hereafter e.g. 1-14]) means that the curve rests on Resolution of tree-ring samples included in IntCal20 (intra-laboratory replicates have been combined). 5 All data included in IntCal20 can be downloaded from http://intcal.org. The data are divided into sets (produced by a single research group, usually a laboratory), and divisions (which may relate to measurements on a single tree by a single laboratory, but may also denote a group of measurements on a single archive). Trees that have been measured by more than one laboratory appear in multiple sets/divisions (e.g. Cott418 has been measured by MAMS (60-2) and ETH (69-19)). A full list of sets and divisions included in IntCal20 is provided by Reimer et al. (2020 in this issue: table S2).
IntCal20 Tree Rings SWOT Analysis 1047 just four measurements from Belfast on bi-decadal blocks (UB, 2-3 & 2-5) and two measurements from Heidelberg on a 4-year and a 5-year block (Hd, 5-5; Figure 4). This is an average data density of 0.1 measurements per year (the average density across the extent of the curve illustrated in Figure 4 is 0.2 measurements per year). This stands in contrast to some of the sites which are modeled against this part of the curve. For example, if the modeling for the chronology of the settlement at Versend-Gilencsa, Hungary is to be believed (Jakucs et al. 2018: figures 5 and 8), the data density for this archaeological settlement is at least double, and probably 20 times, that of the IntCal calibration curve against which it has been modeled. At a key juncture in European prehistory, as we try to understand the process by which Neolithic lifeways spread west and north out of the Danube corridor (e.g. Jakucs et al. 2016;Whittle et al. 2016;Denaire et al. 2017;Meadows et al. 2019), this is clearly unsatisfactory.
The IntCal20 tree-ring data have been produced by 20 radiocarbon laboratories ( Figure 5), with 60% produced by accelerator mass spectrometry (AMS), 31% by gas proportional counting (GPC), and the remaining 9% by liquid scintillation spectrometry (LSS). This is a  IntCal20 Tree Rings SWOT Analysis 1049 marked change from the IntCal13 data, which was produced by eight laboratories with 80% of measurements made by GPC and only 1% by AMS. After application of error multipliers for the legacy data where appropriate (Stuiver et al. 1998a(Stuiver et al. : 1044(Stuiver et al. -1045, the mean quoted estimates of total error produced by the three techniques in the IntCal20 dataset are equivalent: 19 ± 6 BP (AMS), 23 ± 9 BP (GPC), and 23 ± 6 BP (LSS).
In principle more data from a larger number of laboratories should, on the basis of the "wisdom of crowds" (Galton 1907), more accurately reflect the true value of atmospheric radiocarbon at a particular time. But it is clear that the density of IntCal20 tree rings almost never rises above four measurements per year, and for almost 90% of their extent fail to reach a density of two results per year ( Figure 1). In these circumstances, the accuracy of the curve is very much dependent on the accuracy of the new high-resolution datasets as, in periods where these exist, they will overwhelm the decadal and bidecadal data. IntCal20 contains 4952 measurements on single-year samples (4161 on trees dated by dendrochronology and 791 on wiggle-matched trees). These results cover 2731 years, and for 1473 of these (54%) only one single-ring measurement is available. For another 347 years (13%), measurements are available for single rings from more than one tree from a single laboratory, and for the remaining 911 years (33%) results are available on more than one tree from more than one laboratory. Over half of these (61%) are for years in the most recent millennium.
The substantial increase in the amount of tree-ring data in IntCal20 is clearly a strength of the new calibration curve. The higher resolution of much of these new data allows detailed understanding of the shape of the calibration curve that was previously hidden (e.g. Figure 3). It is not, however, an enhancement that is spread equally (Figure 1). For almost half the extent of the tree rings, there are almost no new data and so, not only is there likely to be more shape in the calibration curve lurking within and around the IntCal20 envelope that is currently invisible, but in places the data density underlying the calibration curve is less than that of many archaeological applications (e.g. Figure 4). This is clearly an ongoing weakness. Single-year data now cover 19% of the tree-ring timescale, but well over half of this is not replicated by other measurements of a similar resolution, making the accuracy of IntCal20 vulnerable to any inaccuracy in these data.

Laboratory Replication
Not all the measurements on tree-ring samples reported in the IntCal20 database are on different samples, as some have been measured more than once. Four different kinds of replication have been reported in the dataset: a. replicate measurements by a single laboratory on a single cellulose preparation (n = 973); b. replicate measurements by more than one laboratory on a single cellulose preparation (n=13); c. whole-process replicate measurements on the same tree-ring (or block of more than one tree ring) by a single laboratory (n=339); d. whole-process replicate measurements on the same tree-ring (or block) by more than one laboratory (n=252).
Same-cellulose, same-laboratory replicates (i.e. replicate type a) above) have been reported by three laboratories: QL-(set 1, n=446), AIX-(set 61, n=23), and ETH-(set 69, n=504). These measurements have been combined before further analysis. In the case of the replicate groups produced by QL-, a weighted mean has been taken (Ward and Wilson 1978) before application of the error multiplier (1.3; Stuiver et al. 1998aStuiver et al. : 1044Stuiver et al. -1045; for ETH and AIX data the cellulose processing uncertainty (1‰ in Δ 14 C, or equivalently 8 BP in radiocarbon age) was removed from each replicate measurement in quadrature, and a weighted mean of the replicate ages (with corresponding uncertainty) was then calculated, before the cellulose processing uncertainty was finally added back in quadrature  in this issue). As these repeats form part of the protocols used by the laboratories to estimate their total laboratory error, they do not form an independent check on laboratory accuracy and so we do not consider them further here.
Replicate measurements on cellulose prepared in Arizona on single-rings from bristlecone pine 1971#059 were dated by three laboratories (i.e. replicate type b) above; STE-, 67-1, n=10; AA-, 68-9, n=11; ETH-, 69-51, n=16; Miyake et al. 2017). The differences between the measurements provided by the three laboratories are not statistically significant at the 5% level. They have not been combined before inclusion in IntCal20.
Ten laboratories have reported whole-process replicates on the same tree-ring or block of tree rings (i.e. replicate type c) above; Table 1). Following the application of the relevant error Table 1 Whole-process intra-laboratory replicates on tree-ring samples reported for IntCal20 (*quoted errors are counting errors on samples and standards only (Stuiver 1982: 5;Stuiver et al. 1986: 969); § these replicate groups were used in calculating the QL-radon offset (Stuiver et al. 1998b(Stuiver et al. : 1128(Stuiver et al. -1129; † earlywood/latewood replicates (Fogtmann-Schulz et al. 2017;Büntgen et al. 2018 (Stuiver et al. 1998a(Stuiver et al. : 1044(Stuiver et al. -1045, 31 replicate groups (out of the 339) are statistically inconsistent at the 5% significance level. This is in line with statistical expectation.
Of the 20 samples where measurements have been reported separately on the earlywood and latewood of a single tree ring, two pairs of measurements from the same year on these two ring fractions are statistically inconsistent at the 5% significance level. This is no more than would be expected if these pairs were true replicates on exactly the same samples. While the difference between earlywood and latewood measurements in any particular year appears to vary not just in magnitude but also sign, the weighted mean difference in this dataset is 4.1 ± 5.1 BP ( Figure 6). The samples included in IntCal20 come from across the AD 775 and AD 993 events, where the production of radiocarbon rapidly changed on an annual scale (Miyake et al. 2012(Miyake et al. , 2013. Thus any utilization of stored carbon from the previous years' growth in earlywood (Kimak and Leuenberger 2015) would be expected to be magnified in the radiocarbon signal in these trees, in comparison to those growing at times when the concentration of atmospheric radiocarbon was changing less rapidly. This finding is compatible with the suggestion, based on earlywood/latewood pairs from annual tree rings of a Danish oak dating from AD 1954-1970 that the radiocarbon in the earlywood originates from the actual growth year (Kudsk et al. 2018 These differences are statistically indistinguishable (T 0 =1.8; T 0 (5%)=6.0; ν=1), and none is statistically significant. Whole-process replicate measurements from more than one laboratory are available on samering or same-block samples from 11 trees (i.e. replicate type d) above; Tables 2 and 3). Fourteen of the 356 replicate pairs in this dataset are statistically significantly different at the 5% significance level. This is fewer than would be expected on statistical grounds and may reflect a slight over-estimation of some reported errors (e.g. Pearson et al. 2020 in this issue). The weighted mean differences between laboratories on single trees are generally 1-2‰ (8-16 BP; Tables 2 and 3). For Knet40, the AAR dataset (62-9) is significantly younger than the measurements from the three other laboratories on this tree (12.9 ± 3.1 BP; see Friedrich et al. 2020 in this issue for further discussion). Laboratory offsets need not remain constant, however, and this finding relates to samples from a single tree measured in January 2019. From the IntCal20 data, we have limited evidence for the variability or stability of inter-laboratory offsets between measurements made at different times. The Table 2 Whole-process inter-laboratory replicates on single tree rings from Knet40 included in IntCal20 (divisions 68-10, 62-9, 69-50, and 60-1; Friedrich et al. 2020 in this issue); samecellulose and intra-laboratory replicates have been combined: above diagonal = weighted mean difference, below diagonal = χ 2 red (degrees of freedom). AA AAR ETH MAMS AA -−6.6 ± 11.5 −4.8 ± 9.3 0.1 ± 10.5 AAR 0.90 (9) -11.5 ± 4.4 15.5 ± 4.8 ETH 1.57 (11) 1.16 (36) -4.1 ± 3.7 MAMS 0.68 (9) 1.00 (36) 0.84 (41) - Table 3 Whole-process inter-laboratory replicates on same-blocks from same-trees included in IntCal20 (overall statistics include data from Knet40 (Table 2) where appropriate); † weighted mean differences are statistically significantly different (AA/OxA: T 0 =12.0: . Over the course of these six years, the observed inter-laboratory offset does not appear to vary significantly (T 0 =3.5, T 0 (5%)=7.8, df=3; Table 3). In contrast the size of the offset between AA/OxA replicate measurements made on a Jordanian tree in 2014 is statistically significantly different (T 0 =12.0, T 0 (5%)=3.8, df=1; Table 3) from the size of the offset between AA/OxA replicate measurements made on another Jordanian tree in 2015.
The whole-process replicate measurements in the IntCal20 tree-ring dataset are valuable in demonstrating the reproducibility of the laboratories involved. This is another strength of the IntCal20 tree-ring dataset. Intra-laboratory replicates demonstrate that the laboratories involved reproduce within their quoted uncertainties. We note, however, that Table 1 includes measurements from only half of the laboratories who measured the tree rings in IntCal20, and only three laboratories (UCIAMS-, set 8, 42%; OxA-, divisions 63-15, 63-16, 63-17, and 59-1, 20%; and VERA-, set 64, 20%) have submitted whole-process replicates on more than 10% of the tree-ring samples they have dated. The inter-laboratory wholeprocess replicates are also valuable in demonstrating the reproducibility of measurements made in different laboratories, although some errors may be slightly over-estimated and inter-laboratory offsets generally fall between 1-2‰ (8-16 BP). Tables 2 and 3, however, include measurements from only seven laboratories, and only two laboratories (AA-, set 68, 27% and MAMS-, set 60, 24%) have inter-laboratory replicates on more than 10% of the samples they have dated. This variable, and generally low, replication rate means that our understanding of laboratory variability in IntCal20 is incomplete. Nonetheless, it is much better than our understanding of such variability in IntCal13, where the sample size required for conventional dating meant that very few whole-process or inter-laboratory replicates could be measured. Going forward, high-precision AMS presents opportunities for replication that were not available previously.

Variation in the Radiocarbon Content of Contemporary Tree Rings
An understanding of the intra-and inter-laboratory variation in measurement of the same sample is crucial because it enables differences in contemporary tree rings arising from laboratory variation to be distinguished from differences arising from other causes.
Some kinds of variation may be suspected on theoretical grounds but have not yet been unequivocally demonstrated in practice. These include potential variation in the radiocarbon content of contemporary trees from the same location, based on physiological factors such as the variable use of stored carbon from previous growing-seasons in earlywood, different species, or different environmental compartments. The IntCal20 treering dataset includes little information on these issues. The earlywood/latewood replicates included in IntCal20 have been discussed above and are not statistically significantly different (Table 3), and there are measurements from ETH on contemporary single-rings from two timbers from a trackway at Timahoe West, Co. Kildare, Ireland (Q5653 (69-53) and Q6427 (69-49); 53.3N, 6.9W; Pearson et al. 2020 in this issue). These have a weighted mean difference of 2.4 ± 7.1 BP, which is again not statistically significant.
It is also possible for atmospheric radiocarbon to vary locally for a variety of reasons, including the emission of depleted or 14 C-free carbon from volcanic vents, ocean upwelling, variations in the seasonal extent of the Intertropical Convergence Zone (ITCZ), and anthropogenic sources (Hogg et al. 2019). Data which may be affected by these issues have deliberately not been included in IntCal20  in this issue). Ocean upwelling has been suggested as a possible cause for the weighted mean difference (13.6 ± 6.2 BP) between datasets 1-1 (KI tree, 57.9N, 152.6W) and 1-4 (C tree, 48.1N, 124.4W) (Stuiver and Braziunas 1998), and for the observed offsets between the IntCal data and measurements on Japanese tree rings (e.g. Nakamura et al. 2007). A small number of datasets have also been omitted from IntCal20 because they are within the ITCZ or on its boundary (e.g. Hua et al. 2004); and datasets suspected of incorporating fossil-fuel derived carbon from industry (e.g. Tans et al. 1979) have also been omitted.
Locational effects in the radiocarbon content of tree rings within a hemisphere have long been of concern for calibration (e.g. McCormac et al. 1998), but these are extremely difficult to demonstrate convincingly as they are of similar scale to the inter-laboratory differences observed in the IntCal20 dataset (Tables 2 and 3). Growing season (Kromer et al. 2001;Dee et al. 2010;Manning et al. 2018Manning et al. , 2020, altitudinal (Cain and Suess 1976;Dellinger et al. 2004), and latitudinal Pearson et al. 2020 in this issue) offsets  Three studies avoid this issue. Using data measured at ETH Zürich only, Büntgen et al. (2018: figure 3) identify a weak meridional north-south gradient of declining average 14 C values across the AD 770s and AD 990s, although there is considerable variation around this trend. A statistically significant average weighted mean difference of −8.1 ± 1.9 BP between Irish oak (53.3N, 6.9W; 68-4-6, 69-49 and 69-53-4) and bristlecone pine (37.5N, 118.2W; 68-1-3) has been been calculated taking into account inter-laboratory differences  in this issue). Data from the Heidelberg laboratory published by Manning et al. (2020: 5N, 9.8−12.1E; 63-8, 63-9, 63-10, 63-11) and the same Turkish pine. Kromer et al. (2001Kromer et al. ( : 2530 argue for a time-transgressive offset to older ages for the Turkish pine in the late fifteenth and early sixteenth centuries AD. However, this proposed offset is not seen consistently over other time periods and, taken as a whole, there is no evidence that the full sequence of observed differences between the measurements from the three locations are not solely independent random noise. A turningpoint test for independence (Kendall 1973) on the time series of observed differences gives p-values of 0.33, 0.65, and 0.41 for the German-Irish, German-Turkish, and Irish-Turkish sets respectively. Consequently, it is difficult to distinguish whether these observed differences are due to locational effects as opposed to other potential sources of variation. Further discussion of these issues is provided by Reimer et al. (2020 in this issue). Two points are of relevance here. First, that any locational variation in the radiocarbon content of contemporary tree rings within the IntCal20 dataset is likely to be of similar magnitude to the observed inter-laboratory variation within the dataset (Tables 2 and 3), and, secondly, that IntCal20 contains tree rings from a much wider range of locations than did IntCal13  in this issue: figures 1 and 8). The calibration curve is still dominated, however, by trees that grew between 46°N and 55°N, which account for more than threequarters of the dataset. In these circumstances, no attempt has been made to disentangle the various sources of variation between measurements on contemporary trees in IntCal20. Rather the curve is an estimation of the hemispherical average atmosphere and the quoted uncertainty on the curve encompasses the observed variation between contemporary trees within the IntCal20 dataset from all potential sources.
It is important to note that any additional independent variation, beyond that reported by the laboratories, that is detected within the IntCal20 data is automatically incorporated into the construction of the IntCal20 calibration curve. If the measurements included in IntCal20 from the same calendar year appear more widely spread (overdispersed) than the laboratory reported uncertainties would suggest, this is propagated through curve construction and accounted for in subsequent predictive intervals. This is discussed in detail in Heaton et al. (2020 in this issue). This means that it should be possible to obtain accurate calibration of radiocarbon measurements from anywhere north of the ITCZ using IntCal20, as long as the locational and laboratory variation in the test dataset is managed adequately in the modeling process (e.g. Hogg et al. 2019Hogg et al. : 1285Hogg et al. -1287.

Dendrochronologies
Almost half the tree-ring samples included in IntCal20 derive from the Hohenheim Holocene oak chronology (HOC), the Preboreal pine chronology (PPC) which cross-dates against it, or from floating pine sequences that have been wiggle-matched against the PPC (Figure 9; Friedrich et al. 2004;Reinig et al. 2018Reinig et al. , 2020 in this issue; Sookdeo et al. 2019 in this issue; Sookdeo et al. submitted;Hogg et al. 2016). Measurements on the independent Irish oak chronology (Brown et al. 1986) run to 7164 cal BP (5215 BC), but before this the Hohenheim-based chronologies stand alone, apart from a very few measurements on singlerings from bristlecone pine ( Figure 10). Bristlecone pine has also been measured in the mid-4th millennium cal BP (mid-2nd millennium BC), but otherwise it is only in the most recent three millennia where tree rings from a range of sources have been analysed. European oak and pines constitute 77% of all the tree-ring samples in IntCal20, and European wood dominates everywhere bar the centuries around 2000 cal BP (1AD/1BC) where Japanese wood predominates.
For 65% of the IntCal20 tree-ring samples, the raw data upon which the calendar age of the sample is based is either published or in the IntCal archive. This is the raw ring-width data from the trees that were sampled, except for five Japanese trees which were dated by isotope dendrochronology and for which the δ 18 O measurements are available (divisions 65-5, 65-6, 65-7, 65-10, 65-11, and 65-16). The reference data against which these series were dated is publicly accessible for just under half these samples. Overall in IntCal20 both the raw treering data of the sample and reference material against which it was dated are publicly accessible for 29% of the calibration samples; the raw tree-ring data of the samples are available, but the reference data are not, for 36% of samples; and neither the raw tree-ring IntCal20 Tree Rings SWOT Analysis 1057 nor the reference data are available for 34% of samples. In comparison, neither the raw treering widths nor the reference data were available for 80% of the tree-ring samples in IntCal13.
The dendrochronologies which provide an exact calendar timescale for the tree-ring samples are clearly a major strength of the IntCal20 dataset. Measurements on two independent chronologies are, however, available for only half of the extent of the tree rings. Although there is now some potential to extend the sequence of measurements from a second   Figure 10 Density of tree-ring samples in IntCal20 by source (intra-laboratory replicates have been combined and multi-year blocks spread proportionately across their bandwidth). independent dendrochronological sequence (e.g. Nicolussi et al. 2009), the Hohenheim chronologies are still the longest available and hence inevitably stand alone at the older end of their range. Non-European chronologies only occur in any quantity over the most recent few millennia (Figure 10), and the locations of the sampled trees are clearly biased to a small number of degrees of latitude ( Figure 8). For many users of IntCal, the restricted geographical range of the trees included in IntCal20 is a weakness, and potentially a threat to the accuracy of calibrated and modeled chronologies (see above). The availability of independent dendrochronologies covering the last few millennia from a large number of locations around the Northern Hemisphere (e.g. Hantemirov and Shiyatov 2002;Salzer et al. 2019), however, does present a clear opportunity to remedy this situation over the coming years.

Wiggle-Matching Historic Buildings
Over the past 25 years scientific dating has become central to the process of informed conservation of historic buildings (Clark 2001), although previous attempts to provide accurate dating for timbers from buildings by radiocarbon wiggle-matching have met with mixed success (Galimberti et al. 2004;Tyers et al. 2009;Bayliss et al. 2017;Marshall et al. 2019).  figure 11). Dissection was undertaken by Alison Arnold and Robert Howard at the Nottingham Tree-ring Dating Laboratory. Prior to sub-sampling, the core was checked against the tree-ring width data. Then each annual growth ring was split from the rest of the tree-ring sample using a chisel or scalpel blade. Each radiocarbon sample consisted of a complete annual growth ring, including both earlywood and latewood. Samples were selected to target sections of the calibration curve that represent the range of slopes and plateaux that may be encountered in unknown applications ( Figure 3). As with previous studies, all samples were submitted and dated blind by the laboratories.
Radiocarbon dating of the Lancaster Castle samples was undertaken by the Centre for Isotope Research, University of Groningen (GrM-), the Netherlands in 2018-2019, and at the Laboratory of Ion Beam Physics, ETH Zürich (ETH-), Switzerland in 2019. In Groningen, each ring was converted to α-cellulose using an intensified aqueous pretreatment ) and combusted in an elemental analyzer (IsotopeCube NCS), coupled to an isotope ratio mass spectrometer (Isoprime 100). The resultant CO 2 was graphitized by hydrogen reduction in the presence of an iron catalyst (Wijma et al. 1996;Aerts-Bijma et al. 1997).
The graphite was then pressed into aluminium cathodes and dated by AMS (Synal et al. 2007;Salehpour et al. 2016). In Zürich, cellulose was extracted from each ring using the base-acid-base-acid-bleaching (BABAB) method described by Němec et al. (2010a), combusted and graphitized as outlined in Wacker et al. (2010a), and dated by AMS (Synal et al. 2007;Wacker et al. 2010b). At both laboratories data reduction was undertaken as described by Wacker et al. (2010c), and both facilities maintain continual programs of IntCal20 Tree Rings SWOT Analysis 1059 quality assurance procedures (Sookdeo et al. 2019 in this issue; Aerts-Bijma et al. forthcoming), in addition to participation in international inter-comparison exercises  in this issue).
The results (Table 4) are conventional radiocarbon ages, corrected for fractionation using δ 13 C values measured by AMS (Stuiver and Polach 1977). Figure 11 shows a wiggle-match that includes the complete series of results from LAN-C07. It suggests that the final ring was formed in cal AD 1161-1164 (95% probability; GrM-13353 (AD 1162); Figure 11 6 ). This result is clearly compatible with the date of AD 1162 for this ring known from dendrochronology. It is, however, rare that such a long ring sequence is recovered from a timber of a historic building in England. For this reason, we have divided this sequence into smaller sections that reflect the kinds of tree-ring sequences that are commonly encountered, and remain undated by ring-width dendrochronology, in English standing buildings.
Figure 12(A) shows a wiggle-match that includes 11 measurements on single-year samples (GrM) between AD 990 and AD 1000 (960-950 cal BP), which produces a date estimate for the final ring of this timber of cal AD 1160-1165 (95% probability; AD 1162; Figure 12(A)). This sequence is very short-only 11 years-but targets the rapid increase in 14 C production in AD 993/4 (957/956 cal BP; Figure 3; Miyake et al. 2013).
A wiggle-match that includes 11 measurements on single-year samples (ETH) every three years between AD 1030 and AD 1060 (920-890 cal BP) is shown on Figure 12(B). This estimates that the final ring of LAN-C07 formed in cal AD 1156-1167 (91% probability; AD 1162; Figure 12(B)) or cal AD 1237-1242 (4% probability). This sequence is also too short (31 rings) to be routinely datable by dendrochronology, but targets a steeply sloping section of the calibration curve ( Figure 3). IntCal20 was compiled before the candidate solar energetic particle event at AD 1052 was identified (Brehm et al. submitted), and so does not include additional knots in the spline to maximize the identification of this feature  in this isuse).
Figure 12(C) illustrates a wiggle-match that includes seven measurements on single-year samples (ETH) every three years between AD 1081 and AD 1099 (869-851 cal BP). It suggests that the final ring of the timber dates to cal AD 1107-1117 (16% probability; AD 1162; Figure 12(C)) or cal AD 1160-1170 (7% probability) or cal AD 1173-1198 (72% probability). This sequence is also very short (19 years) and falls on a gently sloping section of curve ( Figure 3).
Clearly these four wiggle-matches all produce date estimates that include the calendar date of AD 1162 for the final ring known from dendrochronology. They are of varying length, some extremely short (< 30 rings), and produce results of varying precision. Even on a plateau, however, it appears possible to produce an accurate chronology to a decadal precision if a 6 In this paper Highest Posterior Density intervals for posterior distributions provided by Bayesian models are cited in italics, with the name of the modeled parameter and the figure on which it is illustrated provided in brackets. R_Date GrM-19906 [A:27] Gap 1 R_Date GrM-18559 [A:199] Gap 1 R_Date GrM-19907 [A:94] Gap 1 R_Date GrM-18563 [A:113] Gap 1 R_Date GrM-19903 [A:115] Gap 1

R_Date GrM-18263 [A:116]
Gap 1 R_Date GrM-19794 [A:133]   IntCal20 Tree Rings SWOT Analysis 1063 long enough sequence and sufficient measurements are available. This represents an opportunity for IntCal20 in areas where high-resolution data are available.
We now consider the single-year case studies that have been undertaken previously (Tyers et al. 2009;Bayliss et al. 2017;Marshall et al. 2019) 7 , each of which has been recalculated using IntCal20 (Table 5). The results are very similar to those produced by IntCal13 (Bayliss et al. 2017: table 5;Marshall et al. 2019: figs 3-4), with the wiggle-match sequences from timbers BAG-B18, BCB-10, and KLV-A06 producing Highest Posterior Density intervals that do not include the date for the final ring known from dendrochronology at even 99% probability. Clearly, the absence of single-year calibration data for the medieval period was not the cause of this inaccuracy. By calculating the weighted mean offset between the measurements on these known-age timbers and the IntCal20 modeled value for the respective year, it is clear that the sequences that produce inaccurate results have both the largest offsets against IntCal20 and the highest χ 2 red values ( Figure 13). This suggests both that there are systematic biases in some of these data and that the quoted errors on some of these measurements may be too small. Inter-comparison studies clearly demonstrate that issues of this kind should not be unexpected (Scott et al. 2019;Wacker et al. 2020 in this issue) and, as this example demonstrates, can be a threat to the accuracy of chronologies produced using IntCal20. For such studies, laboratory quality assurance protocols are clearly paramount, and laboratory reproducibility of the type illustrated in Wacker et al. (2020 in this issue: figure 2 [green]) is clearly required.
This case study demonstrates both the opportunities and the threats of using IntCal20 in periods where high-resolution single-year calibration data are now available. It is possible to obtain routinely decadal precision when wiggle-matching timbers from historic buildings, and it is possible to wiggle-match accurately to this precision with shorter ring sequences than can usually be dated by dendrochronology. The accuracy of the measurements that are calibrated or modeled against IntCal20 is, however, a material factor in whether the chronologies produced are correct. Given that the overdispersion of the IntCal20 tree-ring data, even including the factors additional to inter-laboratory variation included in this estimate described above, is one fifth of that for tree-ring measurements reported in the international inter-laboratory comparison exercises (Scott et al. 2018;Heaton et al. 2020 in this issue: figure 5), most laboratories are clearly not producing measurements of equivalent accuracy to those included in the calibration datasets.

IntCal20 Tree Rings SWOT Analysis 1067
The new dataset is derived from a sub-fossil oak tree (sample 60) from Ebensfeld, River Main, Germany (50.1N, 10.9E). The 92-year ring-width series cross-dates with a t-value of 5.5 (Baillie and Pilcher 1973) to the Holocene German Oak Chronology , when it spans 3691-3600 BC (5640-5549 cal BP; Supplementary Information 1). Dissection was undertaken by Michael Friedrich, on radial sections cut from the slices taken for dendrochronology. The rings had been made visible by cleaning their surfaces using razor blades, the selected blocks were split tangentially from the rest of the sample using a scalpel blade. Each sample consisted of two annual growth rings, including both early and latewood.
A base-acid-base-acid-bleaching was applied for cleaning and cellulose extraction . Kauri wood (ETH-44660) and brown coal (ETH-38779) from Reichewalde, Germany, significantly older than 60 kyr served as reference processing blanks, and a dendrochronologically dated ring (AD 1515; ETH-40759; Brehm et al. submitted: figure  S5.1) of a Swiss pine was used as a secondary standard (Güttler et al. 2013). The blanks and secondary standards were prepared in parallel with the calibration samples applying the same cleaning steps. All samples, and the unprocessed OX-II standards, were graphitized on the fully automated graphitization equipment (AGE) system Wacker et al. 2010a). Samples were analyzed in the MICADAS system ). In addition to the wood samples, each cassette contained two processed blanks and seven OX-II standards for normalization. Data analysis and evaluation was performed using the computer programme BATS (Wacker et al. 2010c). The uncertainties in 14 C age are derived from counting statistics, standards normalization, and sample preparation. The 14 C counts were background corrected using the processed blank and normalized with OX-II standards. Additional uncertainty (1‰ in Δ 14 C, or equivalently 8 BP in radiocarbon age), estimated from long-term laboratory statistics on processed secondary wood standards, was added in quadrature. The measured 14 C concentrations for the Ebensfeld tree rings are given in Supplementary Information 2.
The four datasets that cover the 37th century BC (5648-5549 cal BP) clearly show divergence across some of this period ( Figure 14). The ETH and GrN data are generally closer to each other than they are to the QL and UB data (which closely follow each other). This is particularly apparent in the 3660s BC (5610s cal BP) when both the ETH and GrN datasets appear to suggest a much larger wiggle than is apparent in the QL/UB data. Given the much higher density of the ETH dataset, IntCal20 follows it closely, whereas previous versions of the calibration curve struck a more balanced path between the higher density, but lower resolution, QL and UB datasets and the high-resolution, but sparse, GrN data ( Figure 14A). Given the small range in latitude of the dated trees (less than seven degrees) and what we know of the expected scale of locational and inter-tree variation, this offset appears to originate from either sub-decadal variations in atmospheric radiocarbon, or from inter-laboratory differences, or from a combination of these. Pending further high-resolution datasets for this period, IntCal20 represents the best estimate of the radiocarbon calibration curve for these decades. There is clearly disagreement between the underlying data, however, which is only apparent because we have multiple datasets, some of which are at high resolution. The ETH dataset in this period (69-46) constitutes 69 of the 303 new measurements in IntCal20 in the 8000 years between 4000 and 12000 cal BP (2051-10,051 BC). For most periods of prehistory we simply do not have data of this kind, and so further issues of this type are clearly not only possible within this timespan, but should be expected.
Such unrecognised refinements in the concentration of atmospheric radiocarbon are a potential threat of which users of IntCal20 must be aware when comparing radiocarbon-based chronologies with historical or dendrochronological timescales. As a sensitivity analysis, to explore the effects such future refinements may have on archaeological interpretation, we have constructed two, alternative calibration curves for this period using the IntCal20 methodology  in this issue): one only including datasets 4-1 and 69-46 (GrN/ETH) and the other only including datasets 1-14 and 2-3 (QL/UB) ( Figure 14B). Figure 15 shows differences between the calibrated date ranges provided for a measurement of 4910 BP, with errors of ± 70 BP, ± 35 BP, and ± 15 BP, when this is calibrated using IntCal04, IntCal20, and the two variant calibration curves constructed for this sensitivity analysis. The medians for the four calibrations in each group vary by a maximum of seven calendar years. Clearly this is not a substantive concern for archaeological interpretation.
IntCal20 Tree Rings SWOT Analysis 1069 2009) as discussed in Supplementary Information 3, and recalculated them using using IntCal04, IntCal20, and the two variant calibration curves constructed for this sensitivity analysis. Key parameters calculated using IntCal04 (red) and IntCal20 (black) are shown in Figure 16, and key parameter calculated using the GrN/ETH only curve (orange) and QL/ UB only curve (blue) are shown in Figure 17.
For parameters dating to the end of the 37th and 36th centuries cal BC, differences are very slight (medians vary by an average of 5 years and a maximum of 14). The important archaeological findings from the long barrows study-that the primary phase of burial in four of the tombs ended within a decade or two of 3625 cal BC, and that the initial construction of Wayland's Smithy belongs to the following human generation-appear robust. The posterior distributions of parameters that fall in the late 38th century and early and mid-37th century cal BC are more variable, however (medians vary by an average of 22 years and a maximum of 39). The potential for the initial construction dates of Fussell's Lodge and Ascott-under-Wychwood all to be a generation or so later has two implications. First, these constructions may belong to an even more concentrated horizon, spanning hardly more than a single human lifetime, in the middle part of the 37th century cal BC.   (Reimer et al. 2004 [red]) and IntCal20 (Reimer et al. 2020 in this issue [black]).
Second, the variation in the duration of the initial period of burial in these monuments is also reduced, so that none need to have been in use for more than two or three generations. At all these sites, burial may not have outlasted the living memory of those in the community who witnessed their construction. This would have important implications for our understanding of Neolithic society.
This case study illustrates the strengths and weaknesses of IntCal20 in the period where there is little new data (i.e. in the 8000 years between 4000 and 12,000 cal BP). Decadal and bi-decadal calibration data are sufficient for calibrating single radiocarbon dates accurately (indeed, this is the purpose for which these data were originally obtained). Higher resolution data will uncover sub-decadal changes in the level of atmospheric radiocarbon that are invisible from the existing data. In this example, we see that changes in posterior distributions from chronological models range from less than a decade to a few decades. A few decades is a long time in a narrative at the scale of lifetimes and generations, and can have important implications for archaeological interpretation. But these differences are no more than those observed from modeling   Supplementary Information 3 and calculated using the alternative calibration curves compiled using the IntCal20 methodology  in this issue) for this sensitivity analysis, GrN/ETH data only (orange), QL/UB data only (blue).
IntCal20 Tree Rings SWOT Analysis 1071 different archaeological interpretations of the sequence at these sites in the original studies (e.g. Wysocki et al. 2007: figure 12).

CONCLUSIONS
For the previous generation of research, the part of the radiocarbon calibration curve based on tree rings has been seen as something of a "gold standard" to which other archives can but aspire. More than 90% of the tree-ring data in IntCal20 that was inherited from IntCal13, however, was measured in the 1980s or 1990s. In the intervening two decades, while there has been some extension at the earlier part of the tree-ring dataset (e.g. Kromer et al. 2004) and some replication in periods of particular interest (e.g. Kromer et al. 2010), the major advance in radiocarbon calibration has been the provision of a calibration curve to the limit of the technique based on a variety of other archives (Reimer et al. 2013).
IntCal20 clearly reflects recent technological advances in AMS that enable high-precision measurements to be made in large numbers on single tree rings . This has fostered a renewed interest in obtaining an annual record of atmospheric radiocarbon in the past and for understanding its locational and other variations. Ultimately this will have important implications for the accuracy and precision of archaeological chronologies, particularly those that are based on the formal statistical modeling of suites of radiocarbon dates. At present, however, the part of the calibration curve based on tree rings varies in resolution and replication through time, with single-year data concentrated in five of the fourteen millennia covered by the tree rings, and rarely replicated outside the most recent millennium. The part of IntCal20 that is based on tree rings thus represents a transition: from a calibration curve largely based on decadal and bi-decadal blocks of tree rings, to one based on higher resolution datasets. Opportunities for increased precision and accuracy using IntCal20 are thus currently variable, and anyway do not come without risks, as accurate chronologies will only be produced if the radiocarbon measurements on the archaeological samples that are calibrated against the curve are of equivalent accuracy.
Radiocarbon calibration is a work in progress. IntCal20 has strengths and weaknesses ( Figure 18). It contains more than double the quantity of data than IntCal13, and that data is of higher resolution and has been produced by a larger number of laboratories. There is also much more replication than before, although this is still on a relatively limited scale. High-resolution data are currently only available for part of the tree-ring timescale, and so for the majority of its extent there may be sub-decadal changes in atmospheric radiocarbon that are invisible from the data that are currently available. These sub-decadal changes provide an opportunity for archaeologists to produce more robust chronologies for the past, as not only the accuracy of radiocarbon calibration improves but as the precision of, particularly, modeled date estimates increases. Higher resolution data may reveal structure in what are currently intractable plateaux, and more detailed understanding of the shape of the calibration curve may be exploited to provide precise chronologies for shorter sequences. These opportunities come with threats. They make stringent demands of the accuracy not only of the, often unreplicated, high-resolution calibration data but also of the measurements that are obtained by archaeologists on their samples. If these are lacking, chronologies that are not accurate within their quoted uncertainties may be produced. Clearly there is a threat that not all the sub-decadal changes in past atmospheric radiocarbon are visible in IntCal20, and that there may be more locational variation than is currently apparent. IntCal20 does, however, combine the extensive amount of calibration data that is now available using an explicit statistical methodology to account for the observed variability and provides a common standard calibration for use by archaeologists.
We would like to end on a positive note, by considering how our SWOT analysis may be used to inform a strategy for future enhancements of radiocarbon calibration: building on the strengths identified, tackling the weaknesses, exploiting the opportunities, and mitigating the threats. There is much research required to maximize the utility of radiocarbon dating in archaeology and safeguard its reputation in the discipline, including the following (by no means exhaustive list): • secure the accuracy of datasets included in IntCal through greater measurement replication, including both repeat measurements on the same sample by a single laboratory, and repeat measurements on that sample by two or more laboratories; • extend and replicate single-year measurements through the Holocene so that additional structure in high resolution data can be exploited for archaeological/ palaeoenvironmental/climatic reconstructions, etc; • address the uneven spread of measurements across the potential extent of tree-ring-based calibration through a more coordinated approach; • investigate potential additional sources of variation due to intra-hemispheric locational, latitudinal, and species differences (the availability of multiple independent More laboratories producing measurements Increased availibility of tree-ring data High-precision AMS enables replication Figure 18 IntCal20 tree rings SWOT analysis matrix. IntCal20 Tree Rings SWOT Analysis 1073 dendrochronologies around the hemisphere in the last few thousand years presents a particular opportunity); • ensure the accuracy of measurements that are calibrated or modeled against future iterations of IntCal through on-going inter-comparison exercises, and the use of a common suite of laboratory standards.
Clearly, the need of archaeological users for more accurate calibration is only one, and not necessarily the principal, driver for research into the past levels of atmospheric radiocarbon. But the archaeological perspective can strengthen studies, even where their primary focus is elsewhere. As the IntCal initiative itself so powerfully demonstrates, the research community is stronger when working together across disciplinary boundaries.