Hostname: page-component-89b8bd64d-7zcd7 Total loading time: 0 Render date: 2026-05-14T00:21:39.525Z Has data issue: false hasContentIssue false

Methods for Summarizing Radiocarbon Datasets

Published online by Cambridge University Press:  20 November 2017

Christopher Bronk Ramsey*
Affiliation:
Research Laboratory for Archaeology and the History of Art, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, United Kingdom
*
*Corresponding author. Email: christopher.ramsey@rlaha.ox.ac.uk.
Rights & Permissions [Opens in a new window]

Abstract

Bayesian models have proved very powerful in analyzing large datasets of radiocarbon (14C) measurements from specific sites and in regional cultural or political models. These models require the prior for the underlying processes that are being described to be defined, including the distribution of underlying events. Chronological information is also incorporated into Bayesian models used in DNA research, with the use of Skyline plots to show demographic trends. Despite these advances, there remain difficulties in assessing whether data conform to the assumed underlying models, and in dealing with the type of artifacts seen in Sum plots. In addition, existing methods are not applicable for situations where it is not possible to quantify the underlying process, or where sample selection is thought to have filtered the data in a way that masks the original event distribution. In this paper three different approaches are compared: “Sum” distributions, postulated undated events, and kernel density approaches. Their implementation in the OxCal program is described and their suitability for visualizing the results from chronological and geographic analyses considered for cases with and without useful prior information. The conclusion is that kernel density analysis is a powerful method that could be much more widely applied in a wide range of dating applications.

Information

Type
Method Development
Copyright
© 2017 by the Arizona Board of Regents on behalf of the University of Arizona 
Figure 0

Figure 1 The effect of using the summing of likelihood distributions for measurements with normally distributed errors. The black line shows a bimodal normal distribution (centers: ±10, standard deviations: 3.0) from which 100 random samples have been generated (shown in the rug plot). Each of the other curves shows what happens if these values are measured with different standard uncertainties (gray: 0.1, red: 0.5, green: 1.0, blue: 2.0; purple: 4.0) and then the normal distributions associated with the measurement likelihood summed. Even with this number of samples the distributions are very noisy unless the measurement uncertainty is large, at which point the original distribution is smeared and it is no longer possible to fully resolve the two original modes.

Figure 1

Figure 2 An example of a sum distribution of calibrated 14C likelihood distributions. The open diamonds show the values of the randomly selected dates from the range AD100–AD500 for simulation, the red crosses on the left show the simulated 14C date central values (all errors in this case are ±25) and the light gray crosses below show the median values of the resulting calibrated likelihoods.

Figure 2

Figure 3 Comparison of methods of summarizing a set of 40 14C dates. The same simulations are used as in Figure 2. The open diamonds show the randomly selected dates in the range AD100–AD500, the light gray crosses show the medians of the likelihood distributions of the calibrated dates and the black crosses the medians of the marginal posterior distributions for each dated event. Panel (a) shows the sum of the likelihoods. Panels (b), (c), and (d) all use the marginal posteriors from the same simple single uniform phase model with a start and end boundary (Bronk Ramsey 2009): panel (b) shows the sum of the marginal posteriors, panel (c) shows the marginal posterior for an event simply constrained to lie between the start and end boundary and panel (d) shows a Kernel Density plot based on the dated events constrained to be within the phase. Panel (e) is a KDE plot generated from samples randomly taken from the likelihood distributions. Panel (g) shows the effect of applying the KDE_Model model which uses the KDE distribution as a factor in the likelihood (see text). Panel (g) shows a Kernel Density plot of the original calendar dates chosen from the range AD100–AD500: ideally this is the distribution that the other estimates should reproduce. The overlain green and red distributions with their associated ranges show the marginal posterior for the First and Last events within the series: these should overlap at 95% with the first and last open diamonds which are the actual first and last events sampled; this is the case for those based on the uniform phase model, and the KDE model but not those based on the unconstrained Sum or KDE plot.

Figure 3

Figure 4 Comparison of different approaches to KDE. Panel (a) shows the comparison of different methods for bandwidth estimation for data sampled from a bimodal normal distribution with centers at ±10 and standard deviation of 3; blue: Siverman-rule estimate, red: the MatLab KDE module (Botev et al. 2010) and green: the KDE_Plot method described here. Panel (b) is a repeat of Figure 1 showing the effect of changing measurement uncertainty (gray: 0.1, red: 0.5, green: 1.0, blue: 2.0; purple: 4.0) on the Sum distribution. Panel (c) shows the same for the KDE_Plot function on the same data. Panel (d) shows the application of the KDE_Model model implementation on the same dataset.

Figure 4

Figure 5 This shows 30 individual KDE estimates generated during the MCMC with slightly different kernel bandwidths (based on the parameter g). The underlying data are as for Figure 4a. The blue line shows the mean of these and the light blue band ±1σ.

Figure 5

Figure 6 Schematic of likelihood for a parameter within a KDE model. If 99 parameters, all with precisely known values generate a bimodal KDE as shown in green, and the 100th parameter has a likelihood given by measurement of 6±10 as shown in red, then the combined likelihood for the true value of this 100th parameter is shown in blue.

Figure 6

Figure 7 Plot showing the output of the KDE_Model method when used on the same simulated data as Sum in Figure 2. The open diamonds are a rug plot of the calendar date random samples, the light gray crosses show the medians of the likelihood distributions from the simulated 14C measurements for these dates and the black crosses show the medians of the marginal posterior distributions for the events from the KDE_Model analysis. On the left the rug plot for the central values of the simulated 14C dates is shown and the calibration curve is shown for reference. The dark gray distribution is the sampled KDE estimated distribution. The blue line and lighter blue band overlying this show the mean ±1σ for snapshots of the KDE distribution generated during the MCMC process and give an indication of the significance of any features. The light gray distribution shown above is the Sum distribution for reference.

Figure 7

Figure 8 Simulation of events with an underlying normal distribution N (300, 100). Each simulation has 100 events and assumes a different measurement uncertainty ranging from 25 to 150 yr. The light gray distributions show the effect of using KDE_Plot without any Bayesian modeling. The dark gray distributions are the output of a KDE model using the KDE_Model command within OxCal. The underlying distribution is shown for comparison. The rug plots show the randomly selected calendar dates (diamonds), the medians of the simulated 14C dates (light gray crosses) and the medians of the marginal posteriors of the simulated events (black crosses).

Figure 8

Figure 9 Details of the models used are explained in Figure 8. The estimated standard deviation of the underlying distribution is shown based on 10 different runs of each model with the error bars showing the sample standard deviation in results. The events are sampled from a normal distribution of 100. This plot shows that the KDE_Model algorithm recovers the underlying distribution independently of uncertainty whereas a simple KDE_Plot without a Bayesian model shows over-dispersion as the measurement uncertainties increase.

Figure 9

Figure 10 Two hundred dates were randomly sampled from a multimodal distribution. The events are assumed to be drawn from the trimodal distribution with modes centered on 14,400, 13,800, and 12,900 cal BP with standard deviations of 200, 150, and 150 respectively. There are 40 events in the first mode and 30 in the other two. The 14C measurements are assumed to have a standard error of 50 and measurement scatter has been simulated. The original underlying distribution is shown at the top, a simple Sum below that and the output of the KDE_Model algorithm outlined here. The rug plot is as for Figure 7.

Figure 10

Figure 11 Comparison of Sum distribution (above) and KDE_Model distribution for dates on raths as reported in Kerr and McCormick (2014). The features of the plot are the same as Figure 7.

Figure 11

Figure 12 Comparison of methods for summarizing dates on different types of bronze axe from the British Bronze Age (data from Needham et al. 1998). The phases shown in all cases are from left to right: Acton and Taunton (5 dates), Penard (12 dates), Wilburton (10 dates), and Ewart Park (9 dates). Panel (a) shows the results of using independent (overlapping) single uniform phase models for each group with the probability distribution between the boundaries being visualized using KDE_Plot. Panel (b) shows the same but using a normally distributed phases (using Sigma_Boundary, see Bronk Ramsey 2009). Panel (c) simply uses a KDE_Model for each group of dates. All methods give very similar results indicating that the Acton and Taunton, Penard and Wilburton phases follow on one after the other with a short gap before Ewart Park.

Figure 12

Figure 13 Comparison of Sum distribution (above) and KDE_Model distribution for dates on Paleoindian contexts as reported in Buchanan et al. (2008). The features of the plot are the same as Figure 7.

Figure 13

Figure 14 Comparison of different distributions related to the data from Buchanan et al. (2008) (628 dates). Panel (a) shows the simple Sum distribution; panel (b) is the KDE_Model analysis as shown in Figure 13; panel (c) shows the Sum distribution from the simulation of 628 dates uniformly spread through the period 12,950–8950 cal BP; panel (d) shows the KDE_Model analysis of the same simulated dataset; panel (e) shows the simple Sum plot for dates simulated from the medians of the marginal posterior distributions arising from the KDE_Model analysis. The fact that panel (e) is similar to panel a indicates that the original sum distribution is compatible with the output of the KDE_Model analysis.

Figure 14

Figure 15 A plot of the Ca flux and δ18O isotope ratios recorded in NGRIP (Andersen et al. 2004; Bigler 2004; Rasmussen et al. 2006; Svensson et al. 2008) against the KDE_Model derived probability density for dates from Paleoindian contexts in North America using data from Buchanan et al. (2008).

Figure 15

Figure 16 KDE analysis of two sets of megafauna data from (a) Stuart et al. (2004) (77 dates on Megaloceros giganteus) and (b) Ukkonen et al. (2011) (112 dates on Mammuthus primigenius, excluding one beyond calibration range). In each case the rug plots show in light gray the median likelihoods of the calibrated dates, and in black the median of the marginal posterior distributions. The main KDE_Model distributions are shown in a darker shade with the much noisier Sum shown in the background. The blue lines and bands show the mean and ±1σ of the ensembles of snapshot KDE distributions from the MCMC analysis, showing in this case that the distributions are fairly well constrained.

Figure 16

Figure 17 Comparison of Sum distribution (above) and KDE_Model distribution for dates on Irish archaeological sites in the period 1200 BC to AD 400 as reported in Armit et al. (2014). The features of the plot are the same as Figure 7.

Figure 17

Figure 18 Example simulations of 400 14C dates on uniformly distributed events through the period 1500 BC to AD 500. The uncertainty of the dates is assumed to be ±30 and ±100 14C yr in panels (a) and (b) respectively. The blue bands show the ±1σ variability in snapshot KDE distributions generated during the MCMC analysis with more uncertainty seen in panel (b) (±100) than in panel a (±30). There is no consistent patterning in where rises and falls in the distribution occur and these are therefore consistent with stochastic effects of the simulations.

Figure 18

Figure 19 Plot showing the probability density from the KDE_Model analysis of archaeological 14C dates from Ireland as summarized in Armit et al. (2014) compared to the δ18O isotope record of NGRIP (Andersen et al. 2004; Rasmussen et al. 2006; Svensson et al. 2008) and the LOWESS (smooth 0.02) climate analysis of Armit et al. (2014). The NGRIP data is on the GICC05 timescale, all other data is based on 14C and so on the IntCal13 timescale (Reimer et al. 2013). See text for discussion.

Figure 19

Figure 20 Marginal posterior distributions for the parameter g in applications of the KDE_Model for (a) the simulated bimodal distribution shown in Figure 5, (b) the simulated multimodal distribution shown in Figure 10, (c) the Irish early mediaeval settlements example from Kerr and McCormick (2014) shown in Figure 11, (d) the Pennard phase from the British Bronze Age data of Needham et al. (1998) as shown in Figure 12c, (e) the Paleoindian contexts of Buchanan et al. (2008) as shown in Figure 13, (f) the Megaloceros giganteus data of Stuart et al. (2004) as shown in Figure 16a, (g) the Mammuthus primigenius data of Ukkonen et al. (2011) in Figure 16b, and (h) the data of Armit et al. (2014) as shown in Figure 17. The model is most likely to be robust when the marginal posterior is a well defined, approximately normal distribution. If the mode is close to 1 (as in panel (d), a single phase parameterized Bayesian model would be more appropriate. This parameter should be assessed in combination with the variability in output results as seen either in ensembles of outputs or in the plot of the ±1σ variability.