How Many Dates Do I Need?

Abstract As the use of large-scale radiocarbon datasets becomes more common and applications of Bayesian chronological modeling become a standard aspect of archaeological practice, it is imperative that we grow a community of both effective users and consumers. Indeed, research proposals and publications now routinely employ Bayesian chronological modeling to estimate age ranges such as statistically informed starts, ends, and spans of archaeological phenomena. Although advances in interpretive techniques have been widely adopted, sampling strategies and determinations of appropriate sample sizes for radiocarbon data remain generally underdeveloped. As chronological models are only as robust as the information we feed into them, formal approaches to assessing the validity of model criteria and the appropriate number of radiocarbon dates deserve attention. In this article, through a series of commonly encountered scenarios, we present easy-to-follow instructions for running simulations that should be used to inform the design and construction of chronological models.

Palabras clave: radiocarbon, modelado bayesiano, cronología, muestreo, simulación Large-scale radiocarbon dating programs at varying scales of analysis (e.g., from the single stratigraphic context to entire regions) have become a mainstay of archaeological research in North America, due in part to widespread advances in and adoption of Bayesian chronological modeling techniques (e.g., Abel et al. 2019;Birch et al. 2020;Brown et al. 2019;Cobb et al. 2015;Hamilton and Krus 2018;Holland-Lulewicz et al. 2020;Kennett et al. 2014;Krus and Cobb 2018;Krus et al. 2015; Lulewicz 2018Lulewicz , 2019Manning et al. 2018Ritchison 2018aRitchison , 2018bRitchison , 2020Thompson and Krus 2018;Thulman 2019). As evidenced by long histories of research in Europe, advances in both radiocarbon dating and Bayesian chronological modeling have greatly increased the analytical potential of both large and small radiocarbon datasets in ways that have allowed researchers to completely rethink approaches to archaeological practice and the narrative construction of the past (see Bayliss 2009;Bayliss and Bronk Ramsey 2004;Bronk Ramsey 2009a;Buck 2004;Buck et al. 1991Buck et al. , 1996Hamilton and Krus 2018;Whittle et al. 2011). Bayesian chronological modeling allows archaeologists to formally integrate archaeological knowledge (known as prior or a priori information) into the statistical analysis of radiocarbon data. Such techniques allow for more scientifically rigorous approaches to assessing the validity of radiocarbon data, the robustness of temporal interpretations, and the fit between archaeological assumptions and radiocarbon probabilities. Bayesian models are now routinely built to produce estimations of age ranges, including statistically informed starts, ends, and spans of archaeological phenomena.
Although advances in interpretive techniques have been widely adopted, using simulations to explore effective research designs, determinations of appropriate sample sizes, or the modeling of chronological hypotheses remain generally restricted to the practices and efforts of Bayesian specialists (e.g., Bamforth and Grund 2012;Bayliss and Bronk Ramsey 2004;Bayliss et al. 2007Bayliss et al. , 2008; Buck 2004;Buck and Christen 1998;Christen and Buck 1998;Contreras and Meadows 2014;Crema and Shennan 2017;Edinborough et al. 2017;Griffiths 2014;Jorgeson et al. 2020;Krus and Cobb 2018;Rhode et al. 2014). As the use of Bayesian frameworks becomes more accessible and more widespread, however, it is critical to adopt a set of standardized expectations for developing effective sampling strategies. Bayesian chronological models, like all models, are only as robust as the data and parameters that we provide (i.e., "garbage in, garbage out").
One frequently used means to increase the robustness of a given chronological modeling program has been to employ simulations during research design to explore the complexities of temporal age estimations for particular cases (e.g., Bayliss 2009; Bayliss and Bronk Bayliss et al. 2007;Buck and Christen 1998;Christen and Buck 1998;Crema and Shennan 2017;Griffiths 2014;Jorgeson et al. 2020;Kennett et al. 2017;Krus and Cobb 2018;Manning 2006;Manning and Hart 2019;Rhode et al. 2014;Steier and Rom 2000;Thompson and Krus 2018;Thompson et al. 2019). In the context of Bayesian chronological modeling, simulations involve the use of well-informed, hypothetical datasets and model parameters, based on available observations or expected findings, which are used to evaluate the potential representativeness of an existing or expected radiocarbon dataset or of the efficacy and value of particular prior information and model parameters in producing robust age estimates-estimates that we can be confident will be generally unyielding to the routine addition of new data and information.
Although the value of simulations is well understood and recognized among specialized Bayesian modelers dealing with archaeological radiocarbon data, specific knowledge of how to actually go about using simulations in this way remains limited. As Bayesian chronological modeling continues to become more "mainstream," primarily through user friendly and free software packages, knowledge of how to formally assess sampling strategies and sample sizes must keep pace. Simulations allow those submitting grant proposals or journal manuscripts to justify their research design, datasets, and modeling decisions formally. Additionally, simulations can identify shortcomings, weaknesses, or areas for future research. In this article, we present easy-tofollow instructions for how to run simulations via the freely available OxCal v4.4 software (Bronk Ramsey 2009a and how to use them productively in archaeological research. We first briefly review why simulations are important in the context of the complexities of radiocarbon dating and chronological modeling. We then offer a series of hypothetical case studies that mirror some of the most common modeling efforts undertaken by archaeologists. Although we do not assume prior knowledge of utilizing simulations, we do assume that the reader has a working understanding of the principles and practice of Bayesian chronological modeling. For the sake of space, we refrain from offering explicit definitions for many of the specific terms associated with Bayesian chronological modeling and the OxCal software. For those seeking a more thorough background on the method, theory, terminology, and procedures of the Bayesian analysis of radiocarbon data, a range of published material is available to be consulted (e.g., Bayliss 2009;Bronk Ramsey 2009a, 2009bBuck et al. 1996;Hamilton and Krus 2018;Lulewicz 2018;Whittle et al. 2011).

WHY RUN SIMULATIONS?
The purpose of using simulations is to formally assess the potential representativeness of a given set of radiocarbon dates within a particular suite of model parameters to address a research question (e.g., determining the end boundaries for a particular site or the beginning of the use of a particular diagnostic material across a region). Simply put, simulations help ensure and justify representative age models as well as justify radiocarbon sampling strategies that address well-defined research questions (Griffiths 2014). In this regard, there are a variety of factors to consider. The most notable of these, of course, is the number of available radiocarbon dates. Although archaeologists are well aware of sampling methods and the importance of matching the scale of a sample to the scale of a question or context, when building chronological models, it is important to recognize that it is not only size that matters. Both the estimated temporal range and the usefulness and power of prior knowledge will influence any determination of an appropriate number of radiocarbon dates for a given research design or model.
All of these considerations are further influenced by the location of potential radiocarbon determinations along the calibration curve (e.g., IntCal20 [Reimer et al. 2020]). For instance, determinations dating to roughly the AD 1400s exhibit a single intercept along a steep part of the calibration curve, resulting in easy-tointerpret age estimates with short ranges (Figure 1a). On the other hand, determinations dating to the range of approximately AD 1500-1700, for example, usually exhibit multiple intercepts along the calibration curve. These are sometimes called "reversals" when the curve has a significant upward inflection back in time, or "plateaus" where the curve flattens out and provides no single, clear intercept ( Figure 1b). These phenomena result in calibrations with extremely wide probabilities ranging from hundreds to thousands of years. One of the more notorious examples of such plateaus, the Hallstatt plateau (ca. 2450 BP or ca. 800-400 BC), has been addressed extensively through the use of Bayesian approaches and simulations to increase the achievable precision for radiocarbon dates from this period (e.g., Hamilton et al. 2015;Jacobsson et al. 2018). Similar efforts have been extended for other plateaus dating both earlier (ca. late fourth millennium BC; e.g., Meadows et al. 2020) and later (ca. fifteenth-seventeenth centuries AD; e.g., Manning et al. 2020).
For our purposes, it is important to recognize that depending on the modeling goals, fewer dates will likely be required for producing a robust, representative model for phenomena dating to the fifteenth century AD, as compared to the sixteenth through How Many Dates Do I Need?
November 2021 | Advances in Archaeological Practice | A Journal of the Society for American Archaeology 273 the eighteenth centuries AD. In the case of this latter span (i.e., AD 1500-1700), models may be highly sensitive to the addition of new dates and to new or more informative prior information. In fact, for this span-and other "plateaus"-sample size alone may not be able to ameliorate these inherent effects of the calibration curve.
We can employ simulations to explore the consequences of varying (1) the number of radiocarbon dates and (2) the kinds of prior knowledge built into the model. In regard to calibration curve "reversals" and "plateaus," for instance, simulations can be used to explore the effects of different prior knowledge on the mediation of such calibration issues. For instance, you might test how dating more or fewer stratigraphic layers or incorporating other kinds of a priori information (e.g., dates from historical documentation or other materials with known date ranges, such as coins or historic ceramic wares) might achieve more precise age estimations. Simulations may reveal that more dates from the same context or stratigraphic sequence will not overcome the effects of reversals in the calibration curve. You might then consider a different approach-perhaps dating multiple rings from single pieces of charcoal where the number of years between each date (i.e., number of rings) is known. Such procedures are known as wiggle-matching, and they can greatly increase the resolution of charcoal dates where the analysis is possible (see Bronk Ramsey et al. 2001;Galimberti et al. 2004;Hogg et al. 2019;Jacobsson et al. 2018;Manning et al. 2010Manning et al. , 2020. In each of these cases, simulations are a valuable tool that can inform the allocation of time, energy, and money while simultaneously increasing the potential utility, representativeness, and robustness of new chronological models. Simulations may demonstrate that the number of determinations needed-or the kind of a priori information that would be idealfor building representative models is not feasible. For instance, simulations may reveal that given a set of model criteria, more dates are needed than the archaeologist can afford, or that more stratigraphic information would be useful to constrain determinations and estimated ages. As Krus and Cobb (2018) have recently demonstrated, such knowledge should not hinder a study. In fact, such information allows archaeologists to propose future research, anticipate changes to their models, and to recognize potential shortcomings of their interpretations. In this way, simulations allow for more productive and more transparent chronology-building practices.

HOW TO RUN SIMULATIONS
As noted at the outset-and illustrated through the many citations of publications employing simulations to address chronological research design-none of the issues outlined here are necessarily new. Yet, no simple guide to using simulations exists, and there are no established/published best practices for the use of simulations in building Bayesian chronological models. You may be asking, How do I generate simulated dates? How many simulated models should I run? How do I run a simulated model? How do I interpret the results of simulations? The rest of this article endeavors to address these questions and reveal the mechanisms inside the "black box" of Bayesian simulations. Running simulations does not require any advanced computational or statistical knowledge, and it can be accomplished using widely available software. Here, we provide four examples that represent very common archaeological situations. For each example, we generate dates for simulations and interpret simulation results using Microsoft Excel (this can also be accomplished in Google Sheets, a free and readily available online software). We build and run simulated models using the freely available software OxCal v4.4 (Bronk Ramsey 2009a and the built-in IntCal20 calibration curve (Reimer et al. 2020). We note that other free software options, such as R Statistical Software, could also be used to undertake the analyses we present below. Although we focus on the use of Excel as an easier and more accessible option to most, code-based programs such as R have the added benefits of producing more easily shareable code, outlining specific procedures, contributing to an ease of reproducibility, and of allowing users to create random samples following more complex, multimodal underlying distributions.
The general steps and considerations necessary for designing effective simulation studies are presented as a flow chart in FIGURE 1. A simulated date (AD 1425 ± 20) with a single intercept along the radiocarbon calibration curve (left) and a simulated date (AD 1550 ± 20) along part of the curve producing multiple intercepts and wide probability ranges (right). Figure 2. Before beginning, it is good to know at least something about the kind of model you wish to build. Ideally, you will build a model that reflects the specific archaeological context(s) being studied and that is designed to answer a specific question (e.g., occupation spans, spans of hiatuses/abandonments, or age estimations of particular features/events). Although you may not know the precise temporal range of your context or archaeological phenomena, simulations are built from expected results or informed hypotheticals (educated guesses!).
Once you have built your model (or models, if you are evaluating the effects of different model criteria) in OxCal, you can begin adding simulated radiocarbon dates. For this, we recommend Microsoft Excel as an easily accessible and user-friendly option.
For example, if you estimate that your determinations should fall between AD 1300 and 1500, you will use Microsoft Excel to generate random numbers between these two ages. To do this, you can enter the following equation into a cell in Excel: =RAND BETWEEN(1300,1500). Note that for BC dates, you simply use a negative sign in front of the age. Importantly, simulated dates are input as actual calendrical dates (AD/BC), not radiocarbon ages or BP (although the results can be visualized and presented as BP). You can now drag this column down to generate a series of random numbers. To the right of this column, insert an error range (e.g., 20, 35, 50 years). This will depend on the lab you are using, the period you are working in, or the material you are dating. Choose whatever is the most appropriate for your anticipated dates (in fact, this may be a factor that you wish to assess: how do FIGURE 2. Suggested workflow for running effective simulations.

How Many Dates Do I Need?
November 2021 | Advances in Archaeological Practice | A Journal of the Society for American Archaeology 275 models built using legacy determinations with large error ranges compare to models built using simulated dates with much smaller error ranges?). To the left of your age column, insert an arbitrary identifier (e.g., Date_1, Sim_1, etc.).
Once you have these three columns (Sample ID, Date, Error), you can place your simulated data into your OxCal model as demonstrated in the examples below. To input dates, use the R_Simulate command in OxCal, which is where you will enter the ID, date, and error. As you will likely be inputting many dates across many different models, you may opt to use the Import tool in which you can copy and paste your three columns from Excel into OxCal. If you are specifically interested in the effects of sample sizes, you may start with a few dates and continuously add more to subsequent models (e.g., 10 dates, 20 dates, 30 dates, etc.). In our examples below, we make jumps of five dates for each iteration. Smaller jumps of even a single date could be made, which may be necessary when funds for radiocarbon dates are limited. The outputs of these models will be the basis for interpreting your simulations and assessing your sampling strategies and modeling criteria.
You must now establish your criteria for interpreting simulated data. There are no established rules for determining when a minimum acceptable number of radiocarbon dates has been met. Krus and Cobb (2018) defined a few criteria specifically designed to estimate dates of abandonments at a number of archaeological sites. They decided that when their estimated start and end boundaries fell within the range of their simulated start and end boundaries, and when these simulated boundaries were constrained to 50-year periods or less, they have found an appropriate sample size. For example, if, based on current archaeological knowledge, you estimate that the end of an occupation should fall between AD 1300 and 1450, an acceptable simulation might produce an end boundary that (1) falls within that range and (2) is limited to a 50-year range (e.g., AD 1335-1385). Or put another way, if you estimate the end of an occupation to date to AD 1350, AD 1335-1385 would signify an appropriate simulated solution. Using this method, you can simply keep track of your simulated outputs and stop when these criteria are met or when they begin repeating in subsequent models as more dates are added. This would be the point of diminishing returns in regard to sample size, and it would indicate that the addition of more dates would likely have little effect on model results.
In our models below, we are primarily interested in how modeled results are affected by the addition of more radiocarbon dates. We focus explicitly on the increasing precision of modeled boundaries. We would like to point out, however, that this is not the only method of assessing the effects of radiocarbon sample sizes on the representativeness of a dataset or the confidence one could place in a dataset. For instance, the addition of more dates will, generally, also decrease the expected variance of a given dataset (see Buck and Christen 1998;Christen and Buck 1998). In Figure 3, we show that if you were to run 10 models, each with a different set of five randomly simulated dates between 15,200 and 14,700 BP, the models would yield drastically different ranges for the start boundary of that phase. As more dates are added, however, this variance decreases (Figure 4), indicating a decreasing likelihood that a new set of randomly selected dates will contradict an extant dataset. Although we have provided an example of this type of analysis, for the sake of simplicity and the "how-to" goals of this article, we do not apply these procedures to the scenarios presented below. That said, such an analysis of variance would certainly provide supplemental information to any chronological simulation.
In the examples below, we search for the minimum number of dates needed before the model no longer changes with the addition of more radiocarbon determinations. As such, in the graphs below, where we plot model outputs by number of samples, we are looking for the point at which our data "levels out." This leveling point, associated with a particular number of radiocarbon dates, represents the best solution given a particular set of modeling criteria. All OxCal code used to generate the model frameworks for each of the scenarios below can be found in Supplemental Text 1 and is archived online using Zenodo (Holland-Lulewicz and Ritchison 2021).

Scenario 1: Simple Single-Phase Model, No Calibration Curve Reversals or Plateaus
Scenario 1 is the simplest of the four scenarios presented here. In Scenario 1, you are interested in the start range, end range, and span of a particular phenomenon, such as the occupation span of a single-component, non-stratified site or the use of a particular ceramic vessel form within a region. Fortunately for you, in this scenario, the estimated range for the occupation of the site or the use of the vessel form is located along a steep slope of the calibration curve that produces posterior distributions with small ranges (e.g., Figure 1a).
Previous research, including a handful of dates with moderately sized error ranges, indicates that the range for the phenomenon may span 200 years, between approximately AD 1000 and 1200. To estimate the minimum number of dates needed to model precise, high-resolution start and end boundaries (at a resolution of ca. 50 years), and to determine the point at which the model is no longer sensitive to new dates that might fall within this range, you build a simple model and run multiple iterations with increasing numbers of determinations. The model consists of a single phase with start and end boundaries. To reproduce our results, you simply copy the code for this scenario into OxCal and generate random dates between AD 1000 and 1200 in Excel, adding five more dates to each model iteration up to 100 dates. We used a new set of random dates for each iteration (as opposed to cumulatively adding to each set of dates). Start and end boundary ranges are then copied into Excel to produce plots.
The results of these simulations are graphically depicted as a series of plots at both the 68% and 95% confidence intervals ( Figure 5). For the boundary ranges, note how the minimums and maximums for the ranges become increasingly closer to one another as more dates are added to the simulation, indicating increasing precision. Both the boundary ranges and spans begin to "level out" at around 50 dates, indicating increasing precision up to roughly 50 dates. This would be the point of rapidly diminishing returns. In this scenario, roughly 50 determinations would be an appropriate choice of sample size and would produce both a representative and robust model for determining the chronology of this phenomenon that will be more resilient to the addition of new data as it becomes available.

Scenario 2: Simple Single-Phase Model, Calibration Curve Reversal
Code for the Scenario 2 model structure is the same as code for the Scenario 1 model structure. In this scenario, however, the FIGURE 4. Plot illustrating variance in the estimated maximum age (blue) and minimum age (yellow) for a modeled start boundary across model iterations with increasing numbers of simulated dates between 15,200 and 14,700 BP. Each iteration of the model was run 10 times, each with a new set of randomly simulated dates, to calculate variance (e.g., the model of 10 dates was run 10 times with the same 10 dates, the model of 20 dates was run 10 times with the same 20 dates, etc.).

How Many Dates Do I Need?
November 2021 | Advances in Archaeological Practice | A Journal of the Society for American Archaeology 277 estimated end dates for the archaeological phenomenon in question fall along a less straightforward section of the calibration curve (i.e., at a "reversal"; Figure 1b). The phase is estimated to fall between approximately AD 1440 and 1640.
The results of these simulations are included as an OxCal plot showing the posterior probability estimates (modeled date ranges) for each boundary (Figure 6), and they are graphically depicted as a series of plots (Figure 7). The top row of plots in this figure represents the minimum and maximum ages for both start and end boundaries (consequently, each iteration has two points-a minimum and maximum age for the range). The bottom row of plots represents estimated lengths or spans for each simulated boundary (in number of years). Our start boundary begins to stabilize quickly, with no significant variation as more dates are added, after approximately 40 dates. Our start boundary ranges for over 40 dates remain within our 50-year criterion.
For our end boundary, however, only a single iteration produced a boundary with a span of 50 years or less. When 80 determinations are included, the end boundary span falls to 45 years. Although this fits our criterion of a 50-year span for boundary ranges, subsequent iterations of 85, 90, 95, and 100 determinations produce modeled end boundaries that once again exceed this criterion at 95, 55, 70, and 55 years respectively. Consequently, given our model parameters and sample sizes, we cannot produce a confident, robust estimate (i.e., unyielding to new data) for an end boundary with 100 dates or less. Although sample sizes of over 100 dates may yield more precise ranges, when money and resources are limited, increasing the sample size may not be the best solution. Instead, we might consider a research design that prioritizes the generation of more informative a priori information over the generation of new radiocarbon dates (e.g., excavations that yield stratigraphically ordered deposits or the incorporation of terminus ante quem [TAQ, date after which] estimates from historical information, such as known production ages for glass beads or European-produced metals). This may also be a case where effort might be taken to identify charcoal samples from which more than one ring can be dated and used for wigglematching to produce more constrained probability estimates. In such cases, you could simulate the results of wiggle-matching to determine the efficacy of the method using the D_Sequence FIGURE 5. Results from Scenario 1 at the 68% (blue) and 95% (red) confidence intervals. The top plots presents the minimum (circles) and maximum (triangles) ages for start (left) and end (right) boundaries. The bottom row of plots presents estimated lengths or spans for each simulated boundary (in number of years).
command in OxCal and a set of simulated dates. Although we do not demonstrate these procedures here, we point readers to a number of published examples of such procedures (e.g., Bronk Ramsey et al. 2001;Galimberti et al. 2004;Hogg et al. 2019;Jacobsson et al. 2018;Manning et al. 2010Manning et al. , 2020.

Scenario 3: Sequential Phases and Transitions
In Scenarios 1 and 2, we used simulations to design research aimed at estimating start and end boundaries on some archaeological phenomena. For Scenario 3, we are interested in estimating the length of a transition between two phases. This could represent the number of years passed between two occupations, the length of a hiatus or abandonment, or the length of a transition between two ceramic traditions. To these ends, we have incorporated two simple phases into a sequential model and included start and end boundaries for each phase. The effect of these commands is to indicate that one phase comes after another, but that the interval between the two phases is unknown. The phases may directly follow one another in time, or there may be a significant gap in years between the two. In either case, using a sequential model includes the prior assumption that the two phases do not overlap in time. In this case, we are interested in the end boundary for the first phase and the start boundary for the FIGURE 6. OxCal plot of start and end boundaries for Scenario 2 simulations. The numbers along the side are the number of simulated dates included in each iteration of the simulation. Bars beneath each posterior probability distribution represent the 68% and 95% confidence intervals. These boundary ranges are graphically represented in Figure 7.

How Many Dates Do I Need?
November 2021 | Advances in Archaeological Practice | A Journal of the Society for American Archaeology 279 second phase. Dates were iteratively added to both the early and later phases. The early phase is estimated to date to between approximately AD 900 and 1000, whereas the second phase is estimated to date to between roughly AD 1100 and 1200. The question is whether this hypothesized 100-year gap can be modeled and supported using radiocarbon dates and a limited set of prior information about a hypothesized sequence. The "Difference" command was used in OxCal to calculate the difference between the start boundary of Phase 2 and the end boundary of Phase 1. The "Difference" command produces a posterior distribution that yields a modeled range of number of years that includes estimates at the 68% and 95% confidence intervals. This calculated difference represent the modeled estimate for the gap between Phases 1 and 2.
The results of these simulations are graphically depicted in Figure 8. The plot illustrates minimum and maximum lengths (in years) at both the 68% and 95% ranges for the simulated gaps between the early and late phases. Depending on the number of radiocarbon dates included, simulations produced gap lengths that span, at most, up to 220 years. At the very minimum, the two phases are modeled to have a potential overlap of up to 20 years. This means that some sample sizes of radiocarbon data would not even be robust enough to identify/model any expected/known hiatus, even when the appropriate sequential model is used, which stresses the importance of sample sizes and simulations in designing research that will be able to address your hypotheses and expected outcomes. Although our plots never completely level off as they did in Scenarios 1 and 2, the iterations with over roughly 60 dates seem to be the most acceptable solution. After 60 dates, the median gap length does seem to level out around the 100-year estimate, although the maximum and minimum ranges cannot be ruled out. As such, it does not seem as if we can necessarily achieve the resolution necessary to either support or reject the hypothesis that a 100-year gap exists between these two phases given the available prior information. This is the point at which you can begin to design research that aims to remedy these deficiencies. With the use of simulations, you have determined that more information is necessary if the question of abandonment is going to be addressed effectively and empirically. In this case, you may begin to propose new FIGURE 7. Results from Scenario 2 at the 68% confidence interval. The top row of plots presents the minimum (circles) and maximum (triangles) ages for start (left) and end (right) boundaries. The bottom row of plots presents estimated lengths or spans for each simulated boundary (in number of years).
excavations that target stratified deposits or deposits with built-in TPQs/TAQs (i.e., climatic events such as floods or volcanic activity). If you are working in a period or location in which written information is available (i.e., ethnohistoric documentation, stelae), you may target sites or deposits that can be linked to historically known date ranges (or, at the very least, use such documentation to derive TPQs/TAQs that can be used to increase the precision of radiocarbon determinations). Beyond stratigraphy and independent chronological datasets, you may draw on detailed ceramic or lithic seriations (either site based or region based) to model a radiocarbon dataset effectively. At this point, your research design will of course be partially determined by your region, period, and methodological specialization-all of which can be creatively leveraged to build chronological models, and the effects of which can be explored through preliminary simulations.

Scenario 4: An Event within a Stratigraphic Unit
In each of the three previous scenarios, stratigraphic information was not a major factor in building models. Even so, stratigraphic information provides the most informative, useful constraints when modeling radiocarbon dates from archaeological contexts. In this fourth scenario, we simulate a situation in which you are attempting to date a single event or stratum (e.g., flooding event, burning event, capping event, house floor, midden deposit) where stratigraphic relationships between dated materials can be determined and incorporated into modeling efforts. Our hypothetical stratigraphic profile is depicted in Figure 9.
In these simulated models, we set up a simple sequence with start and end boundaries as well as a boundary that is used to model the date of our event of interest (Stratum IV). In this case, Stratum IV is a layer of sterile clay. It is located above Stratum V and below Stratum III. Based on a regional ceramic chronology, Stratum V is estimated to date to AD 975-1000. Stratum III is estimated to date to AD 1100-1125. The question becomes, How many dates and what stratigraphic data are needed to model the age of this sterile layer effectively? That is, how many dates and what kinds of model parameters are needed to produce a robust age estimation of this layer that is not sensitive to further changes to sample size or additions of stratigraphic information? You will note that there are also two more strata above Stratum III and two more strata below Stratum V from which we could pull datable materials.
Dates added to strata for each iteration are illustrated in Table 1. Our first iteration includes a single date from Stratum V and a single date from Stratum III, just above and below a boundary command representing Stratum IV. For the next five iterations, we add two dates for each subsequent iteration, one more to Stratum V and one more to Stratum III. At the seventh iteration, we add dates from Stratum II and Stratum VI (estimated ages for randomly simulated dates are included in Figure 9). Through the tenth iteration, we continue to add dates to these two strata (II and VI). At the eleventh iteration, we add dates for Strata I and VII. Consequently, through 15 iterations, we increase both the number of radiocarbon determinations as well as the amount of stratigraphic information built into the model. FIGURE 8. Results from Scenario 3 at the 68% (blue) and 95% (red) confidence intervals. The plot illustrates the minimum (circles) and maximum (triangles) lengths of these spans and the medians of the posterior probability distributions produced using the "Difference" command in OxCal. The difference was calculated between the start boundary of Phase 2 and the end boundary of Phase 1, representing the modeled gap between the two phases. The dark bar indicates the known/hypothesized true gap length of 100 years.

How Many Dates Do I Need?
November 2021 | Advances in Archaeological Practice | A Journal of the Society for American Archaeology

281
The results of these iterative simulations are graphically represented in Figure 10. The top plot represents the minimum and maximum age ranges for the modeled boundary (Stratum IV). The bottom plot represents the overall modeled span of this boundary (Stratum IV). At both the 68% and 95% confidence intervals, the data level off and cease to exhibit extreme variability at roughly 12 radiocarbon determinations, or the sixth iteration. The sixth iteration included five radiocarbon determinations from Stratum III and five radiocarbon determinations from Stratum V, as well as one date each from Strata II and VI. In this case, for this particular location along the calibration curve, adding more dates or more stratigraphic information does not necessarily increase the precision of the model. Although conventional wisdom may tell us to run dates from each available layer, we need to take into account the specific questions we are asking. In this case, we are specifically interested in the age of Stratum IV. As such, instead of getting a small number of dates (or even an individual date) from each available stratum, we focus our energy on intensively dating the strata immediately adjacent to Stratum IV. It is likely the case that five dates each from Strata III and V was unecessary. This scenario illustrates the utility of simulations to design effective, problem-oriented sampling strategies.

OTHER POTENTIAL APPLICATIONS OF SIMULATIONS
The scenarios presented here represent situations in which archaeologists commonly find themselves. The first three scenarios were primarily concerned with determining appropriate sample sizes and whether sample size alone could be used to produce robust age estimations. Scenario 4 was slightly more complex, evaluating the effects of both sample size and available archaeological information on the ability to produce robust age estimations within a stratigraphic context. These scenarios barely scratch the surface of the kinds of modeling decisions that can be evaluated using simulations. Although space limits us from illustrating many of the kinds of situations archaeologists may encounter and address with simulations, we briefly highlight some potential expanded uses below. The procedures outlined above should provide the appropriate skills and background to be able to use simulations in these diverse ways and toward the particular goals and situations of the archaeologist. These expanded uses of simulations may include questioning long-held chronological assumptions, accounting for outliers, exploring the effects of boundary choices, and incorporating temporal information from non-radiocarbon data.

Challenging Chronological Assumptions
In the scenarios presented here, we employ previous observations to build a framework for our simulations-that is, we use previous knowledge about chronology (i.e., an age range or a gap length) to build hypothetical radiocarbon datasets and to evaluate potential model parameters. What we have not discussed, however, is the possibility that such ranges are inaccurate or imprecise to begin with. For each of the scenarios here, especially in North America, the prior chronological information may represent extant culture-historic frameworks based on previous excavations, ceramic analyses, lithic analyses, settlement pattern analyses, and a handful of pre-AMS radiocarbon dates (often with large standard deviations). In many cases, the chronological frameworks attached to culture-historic sequences represent educated guesses-or "eyeballing"-based on informal, unstandardized interpretations of calibrated radiocarbon dates and associations with particular kinds of artifacts. In cases worldwide, these culture-historic phases can appear quite precise (i.e., phases of 50-100 years). FIGURE 9. The hypothetical stratigraphic unit referenced in Scenario 4. Estimated age ranges for each stratum are derived from a regional ceramic chronology. The goal of the simulations in Scenario 4 is to determine the appropriate modeling criteria and sample size of radiocarbon determinations to be able to effectively date Stratum IV, a sterile clay layer. Iterations of Scenario 4 models, including the number of dates iteratively added to each layer, can be found in Table 1.
Jacob  Advances in Archaeological Practice | A Journal of the Society for American Archaeology | November 2021 With simulations, however, we can explicitly test whether or not such resolutions are possible given a particular set of data. That is, given the available priors and radiocarbon dates, can we differentiate chronological hypotheses? For instance, we might use simulations to assess how many radiocarbon dates would be needed to determine whether or not a particular ceramic style or lithic form was used for 75, 100, or 200 years. In this way, we can begin to alter our hypothetical datasets and build in alternative model parameters to design ways to evaluate and test chronological hypotheses empirically.

Accounting for Outliers
Outlier models are widely used by archaeologists engaged in Bayesian chronological modeling (see Bronk Ramsey 2009b). Such modeling helps to account for the effects of outliers within a given dataset on the outputs of a particular model. For instance, a Charcoal Outlier model may be applied to all of the charcoal dates included in a model to weight these dates differentially based on their fit in relation to all other dates included within the model, as well as their fit with the prior information included. The purpose of this would be to account for the potential offset between the actual date of the event of interest and the date of the piece of charcoal (which could match the event of interest or, more likely, date to earlier than the event of interest). Similarly, General and Simple outlier models can be used to account for expected variation among dated materials and their modeled fit. Indeed, Lulewicz (2018), using a series of sensitivity analyses on an extant radiocarbon dataset, has shown that the use of an outlier model can significantly affect the model output and alter long-held chronological assumptions. In this way, simulations may be used to assess the potential effects of different kinds of outliers. For example, you may have a handful of extant charcoal dates, but you are interested to know how these dates might fit with a batch of new non-charcoal radiocarbon determinations. In this case, you might use simulations to explore the effects of outlier modeling and the addition of more determinations from short-lived samples such as animal bone or charred seeds. When an extant set of radiocarbon dates is available, these dates can simply be added to models as R_Dates alongside simulated dates.

Exploring the Effects of Boundary Choices
An important modeling decision not often discussed by archaeologists (see Bronk Ramsey 2009a) is the decision of which kinds of boundaries are most appropriate for the archaeological phenomena being modeled. Different kinds of boundaries are used to alter the shape of the expected distribution of radiocarbon determinations within a model. For example, the default boundary commands applied to a single phase in OxCal force the dates within that phase to be uniformly distributed across the span of the phase. The trapezium option, on the other hand, assumes that at the beginning and ends of the phase, dated samples will be rare (see Lee and Bronk Ramsey 2012). Through time, the number of samples increases, eventually plateaus, and then decreases again toward the end of the phase. This may be more appropriate when modeling something such as the occupation span of a particular settlement or regional models for the use of a particular lithic technology. In the case of a lithic technology or use of ceramic decorative styles, we might assume that use increases, plateaus, and then decreases rather than starting abruptly in full force and then ending abruptly (as would be represented using the default boundary commands). Alternatively, a settlement may be gradually populated and grow, but then be abruptly abandoned, in which case you may combine different kinds of start and end boundaries to account for this possibility. In fact, recent studies have used varying kinds of boundaries to model different kinds of archaeological phenomena, including the occupation spans of settlements (e.g., Barrier 2017; or the use of particular ceramic styles and length of regional cultural traditions (e.g., Lulewicz 2018Lulewicz , 2019Quinn et al. 2020). These decisions must be made in close consideration with both archaeological data and with comprehension of appropriate middle-range theory relevant to the phenomena you are modeling. In this case, simulations can be used to model the effects of different assumptions you might have about the temporality of particular phenomena and the nature of their emergence or decline.

Incorporating Non-radiocarbon Dates
In some cases, you may have dates derived from other sources, such as historical documentation or OSL determinations, that you expect to include in your chronological models alongside radiocarbon data. Simulations provide a way to assess the effects of these alternative temporal datasets on a set of radiocarbon dates. On the other hand, a simulated set of radiocarbon dates may be used to assess the reliability of other scientifically derived date ranges (e.g., OSL or TL dating). It has been demonstrated that OSL dates, with their long age estimations and potentially unreliable results, can significantly affect models produced using primarily radiocarbon data (Su et al. 2020). Alternatively, OSL dates may increase the fit of model expectations and a radiocarbon dataset as an independently derived temporal estimation (e.g., Pluckhahn and Thompson 2017).
Likewise, historical information such as documentation (e.g., town abandonments, population movements, fort constructions) can greatly increase the precision and resolution of model outputs, especially in situations such as Scenario 2 where the location along the calibration curve produces less than helpful results (Thompson et al. 2019). Historical information is useful even when modeled as TPQs or TAQs. For instance, a coin of known age might serve as a useful terminus post quem (TPQ, or the date after which) for modeling dates from a particular feature. Or, a journal entry noting abandoned towns along a river may serve as a useful TAQ when attempting to model village abandonments. Simulations can be used to assess the effects of this kind of information on a set of radiocarbon dates and the utility of this information in addressing a particular research question (e.g., the date of a  pit feature or the abandonment of a town) during the research design or interpretation phases.

CONCLUSION
As the use of large-scale radiocarbon datasets increases and the use of Bayesian chronological modeling becomes more commonplace, it is imperative that we develop a community of practice within the field of archaeology. We must employ frameworks that provide the tools needed for evaluating the potential of a given research design or the robustness of published results. Using simulations is an effective way of standardizing approaches to sampling strategies. Even when criteria for evaluating simulations vary, the practice of running simulations, thinking through criteria, and evaluating expected model outputs creates a more transparent science. Transparency is key if we are going to build models that are reproducible and that can be held to standards of representativeness and robustness. Simulations force us to think through the explicit relationships between our modeling decisions and radiocarbon datasets. We have all heard the "garbage in, garbage out" adage relevant to all models. Simulations provide archaeologists a powerful way to keep garbage out of their chronometric models.

Acknowledgments
Thanks to both Victor Thompson and Jennifer Birch for productive discussions about these issues. Thanks also to four anonymous reviewers whose comments greatly improved the original manuscript.

Data Availability Statement
No new or published data have been used or presented in this manuscript. All OxCal code for replicating the simulations presented here can be found as Supplemental Text 1 and is archived at Zenodo (Holland-Lulewicz and Ritchison 2021;DOI:10.5281/ zenodo.4647930).