DEVELOPMENT OF THE INTCAL DATABASE

. The IntCal family of radiocarbon ( 14 C) calibration curves is based on research spanning more than three decades. The IntCal group have collated the 14 C and calendar age data (mostly derived from primary publications with other types of data and meta-data) and, since 2010, made them available for other sorts of analysis through an open-access database. This has ensured transparency in terms of the data used in the construction of the ratified calibration curves. As the IntCal database expands, work is underway to facilitate best practice for new data submissions, make more of the associated metadata available in a structured form, and help those wishing to process the data with programming languages such as R, Python, and MATLAB. The data and metadata are complex because of the range of different types of archives. A restructured interface, based on the “ IntChron ” open-access data model, includes tools which allow the data to be plotted and compared without the need for export. The intention is to include complementary information which can be used alongside the main 14 C series to provide new insights into the global carbon cycle, as well as facilitating access to the data for other research applications. Overall, this work aims to streamline the generation of new calibration curves.


INTRODUCTION
The generation of the IntCal, Marine, and SHCal radiocarbon ( 14 C) age calibration curves (Heaton et al. 2020a;Hogg et al. 2020;Reimer et al. 2020) is only possible because of the research and care which goes into generating and checking the underlying 14 C datasets used in their construction.In particular, the assessment of data quality relative to set criteria and the collection of associated metadata (Reimer et al. 2002(Reimer et al. , 2013) ) are key elements in the ongoing compilation of these datasets.In addition to their value for 14 C calibration, these 14 C datasets also have an importance in their own right for a wide spectrum of different areas of research (Heaton et al. 2021) and have routinely been made available to the international research community through the IntCal database (previously available from the Queen's University Belfast webserver).The developments reported here build upon that initiative with the intention of facilitating access to, and enhancing transparency of, the broad range of information used in the construction of the calibration curves.
At present, the data collected and discussed here relate only to the construction of the pre-1950 IntCal, Marine, and SHCal curves that span from 55,000-0 cal yr BP.However, the intention is to broaden this to include the data for calibration in the post-1950 period (Hua et al. 2022).

AIMS OF THE DEVELOPMENT WORK
The main aims of the new database structures are to facilitate the sharing of not only the primary, raw, 14 C data used for calibration purposes but also the related data (such as tree ring measurements), metadata (including method statements and images), and to do so in a way that enables the data to be manipulated programmatically using tools such as R (R Core Team 2022), Python (e.g., Van Rossum and Drake 2009), and MATLAB (e.g., Higham and Higham 2016).The specific objectives are to: 1. Make all the primary data (the 14 C measurements and accompanying calendar age information with their associated uncertainties) that are used in construction of the IntCal family of calibration curves available in a digitally readable form.
2. Organize this data, as far as possible, by primary record rather than measurement initiative, in order to facilitate comparison of results from multiple laboratories and to put the data into a geographic context.
3. Particularly in relation to dendrochronology, provide supporting data such as ring-width series, and metadata detailing the methods that have been applied (Reimer et al. 2013).4. Include associated numerical data such as calendar age correlation or covariance matrices for any record where the approach used to generate the timescale introduces dependencies between the estimates of its calendar ages, for example, the annually resolved records such as the Lake Suigetsu record (Bronk Ramsey et al. 2020) with its modified varve-counting chronology, and ancient New Zealand kauri trees wiggle-matched onto 14 C variability (Cooper et al. 2021).In a similar way, the Cariaco Basin (Hughen and Heaton 2020) and the Pakistan and Iberian margin records (Bard et al. 2013) which have calendar ages obtained by tuning to closely related climate markers (Heaton et al. 2013), use the proxy data and derived relationships in the construction of the calibration curve (Heaton et al., 2020b).The dating and proxy information for the speleothem records is also fundamental to the timescale for the curves.
2 C Bronk Ramsey et al.

Associate (or link
) the data and metadata with the relevant publications, providing complete DOI information and URL links to the publications.This includes links to complimentary data archives, particularly those relating to the supporting dendrochronological data.6. Provide methods for visualization of the 14 C and dendrochronological data.7. Include tools for checking and assessing data.8. Allow for the easy import and export of both data and references to a range of other software.
These aims are intended to help different groups of researchers working with the data: those wishing to use or independently evaluate the data, those working on the preparation of new datasets, and the members of the IntCal group working on curve construction.It is important to remember that research on the underlying records is an ongoing process and that, for example, the timescales for the speleothems and their derived chronologies will change in future iterations (Cheng et al. 2021).

OVERALL DATA MODEL
In order to achieve these aims, use has been made of the pre-existing IntChron framework (Bronk Ramsey et al. 2019).This is specifically designed for sharing linked data and includes elements relevant to chronological data, particularly the ability to handle different timescale units and tools for data visualization.The data schema and associated tools have been updated to include elements required for the IntCal datasets, but the overall data model has been found to cover all the core IntCal requirements.

Data Organization
The data structure is essentially hierarchical but with the ability to link and associate information by searches.At the top level there are three main classes of information, with all other information organized within this structure: 1. Records: These contain all information relating to individual sites or records; each record has a unique short name, or site code, which is used as a key for accessing information.Within the record, there are three types of information: a. Information on the location (latitude, longitude, elevation) and type (e.g., Marine, Terrestrial, Speleothem, : : : ) of the record.b.Data series lists which are used to hold data for the record.For this application the main data series types are: i. IntCal data: the primary 14 C calibration data (containing 14 C measurements and quoted uncertainties, as well as the accompanying calendar age information).
ii. Dendrochronological sample data: the ring-width or oxygen isotope data associated with the measurement series (if appropriate).
iii.Metadata: other descriptive information about the record.
iv. Attachments: files (typically images) cited in the metadata.v.Other data, such as ages and correlation or covariance files for records with a level of dependence between the calendar age estimates can be added with specific series types.
c. Reference links for the record as a whole and for the series within it.2. Project data: which refers to data series that do not have a specific link to single records or sites.In the case of the IntCal database, information held at this level includes the calibration curves themselves and an index of data series organized by the IntCal dataset number.This also includes information on the time-variant relationship between the calibration curve and other (e.g., ice-core) timescales (Adolphi and Muscheler 2016;Adolphi et al. 2018).3. References: which holds full bibliographic details for all references referred to in the database (typically listed under records or data series).These references are usually directly linked to the relevant journal articles via doi.org and can be exported in BibTeX format for use in bibliographic tools.

Data Storage
The underlying storage format within IntChron is JSON because this facilitates easy interfacing with programming tools (most easily using tools such as R, Python, and MATLAB, but also in principle with languages such as C and C#), static archiving/storing of data without the need for software installation (Bronk Ramsey et al. 2019), and is commonly used in web-based applications.
The data will be stored and used in three different ways: i.There is an active database accessible to members of the IntCal working group which is intended to help develop new calibration curves.This allows for the addition of new records and the updating of existing ones.
ii.There is a static archive of the IntCal20 datasets at https://intchron.org/archive/IntCal/IntCal20/index.json,which holds all the information accumulated for IntCal20 as it was when the curve was constructed.This is open access.
iii.It is possible for users to make their own copies of this archive (or parts of it) for their research and to prepare new data for inclusion in IntCal.These can be stored on the IntChron server in users' own areas, or on users' own computers.
In the case of the IntCal database itself (point "i" above), a hybrid approach has been adopted.The user interface, archives and file transfer all use JSON, but for the primary IntCal data, an underlying MySQL data table has been retained to minimise the risks of unintended changes.
In addition, the JSON data records are stored in a database rather than as files.This has some advantages for a multi-user system and enables the database maintainers to use an automated archiving system on the computer server which holds the database.However, from the perspective of a user of the database or of the derived archives, the data organization is effectively in the form of JSON objects.
Reflecting the data organization, there are only three types of JSON files that are used for data exchange and presentation.These are: 1. Project data: these files hold links to all of the relevant records and project-level data series.
In addition, the project data file contains all of the relevant publication information and details of parameter characteristics.2. Record data: these contain all the information relevant to a specific record, including the data series contained within it.
3. Series data: these are for project-level data series which do not relate to specific records (such as calibration curves).
The overall database model is based on linked data, so it can also include references to files held outside the JSON data structure via attachments (typically images).In general, the aim has however been to avoid putting key information in such attachments because it makes for more difficult data distribution.
As a variant of the above model, it is possible to have all record and series data embedded into a single project data file.We have used this option for the data archive because it means that all of the data (other than attachments) can be retrieved from a single file rather than having to make multiple requests for each record and series.This approach would not be efficient for very large projects but poses no problems for the present IntCal datasets in terms of data handling (the entire IntCal20 archive is < 10 MB).
Attachment files are organized in a hierarchical file structure based on record or series name.
A full archive of the IntCal20 data files is included as supplementary online information in this publication, allowing the complete archive to be reproduced without access to the current site.This is an important element in the long-term availability of the data.

Software Overview
In principle, the archive can be worked with entirely using software tools such as R, Python, and MATLAB.However, the IntChron integration tool (Bronk Ramsey et al. 2019) has been further developed to facilitate work with the IntCal database and provide a user interface for some pre-defined data manipulations.This tool can be accessed and preloaded with the open access static IntCal20 data using the url: https://intchron.org/tools/integrate/integrate.html?filename=https://intchron.org/archive/IntCal/IntCal20/index.json The integration tool has been specifically designed to work with data organized in the format described above, and the user is presented with a list of records and data series.The data can be explored by following links from this level.There are also search facilities built into the system which enable lists of relevant data series or primary data to be extracted.Records can be displayed on a map and the associated information retrieved by selecting the individual location points.Figure 1 shows a screenshot of the application in use.
This tool is available on the IntChron server and the full interface code will also be distributed with future releases of OxCal so it can be used without access to the IntChron site should that be necessary.

DATA ELEMENTS
The overall data schema for IntChron (including the parameters used for the IntCal dataset) is given at https://intchron.org/schema with the current version supplied in the supplementary information for this paper.Here we will focus on how the data are used specifically for IntCal, concentrating specifically on elements which might be less intuitive or where this might not otherwise be obvious.Each parameter has a formal name used in the JSON objects and for searches, and a display-orientated name (see names are all defined so that they are valid JavaScript variable names and can also be used as URLs without escaping.

Record Level Information
At the record level, the key information held is about the location.For dendrochronological samples, the genus and species of wood sampled (e.g., Quercus robur, Agathis australis) is also included (see Table 1 and Figure 2 for details).There is an optional comment field at this level which can consist of information not included in the standard parameters.The references at this record level should only be the key papers relevant to the sample set within IntCal.More specific references, for example to the dendrochronology, can be included with the dendrochronological series or metadata.The record name is ideally a formal site code (such as SG06) or failing that a short form of the site name (such as Maraa).In the case of records containing compilations of data (mostly older datasets) the lab code, country of origin of the samples, and taxon are used together (as in QL_DE_Oak).

Data Series Types
There are four main data series types included in the records.These are: IntCal_Data (the primary 14 C calibration data used in IntCal), Dendro_Sample (dendrochronological data such as ring-widths for the samples, Data (a generic holder used primarily for metadata) and Files (attachments which are used principally for figures which cannot be included in any other way).prior age estimate before AD1950 with an associated uncertainty.For those samples/archives which have uncertain calendar ages when entering IntCal, the calage values provide the prior calendar age estimate before curve construction.Consequently, in the construction of the calibration curve the calage parameter is used as the inputs for curve construction, while the t parameter provides the posterior estimate of the age after curve construction has been completed and is the measure used for plotting purposes.

Dendrochronological Sample Data Series
These can be imported from various dendrochronological formats directly, or from the NOAA International Tree Ring Data Bank (ITRDB: NOAA NECI 2022) exported as Tucson.rwlfiles.The internal format of these files has some unique features needed for this global dataset.
Ring numbering is typically from old to young but can be reversed where this is the case for the primary data.The date (t) parameter is a floating-point astronomical year and should reflect the growing season for the wood.As previously described, NH wood grown in 1950 will be stored as 1950.5, whereas SH wood that starts to grow in 1950 will be stored as 1951.0.In the interface you can choose whether to use the Schulman convention for display purposes.If this is selected, 1951.0 will be shown as AD1950⊣ showing that it is the end of this year, whereas if the convention is not applied it will show as AD1951⊢ indicating the start of the year.Alternatively, all dates can be shown in fractional format.The purpose of the internal format is that samples can be plotted on an absolute timescale that takes account of the different growing seasons between the two hemispheres.

Metadata
In order to properly understand the background to the dendrochronology underpinning the datasets used in IntCal, the database includes metadata.Such metadata are of greatest importance when the associated data are not already published elsewhere.The data included have been selected to include key information needed as outlined in Reimer et al. (2013).The metadata are usually structured to address key questions (see Table 2) and are in the form of a plain text file.Where tables are required, these can be incorporated within the main notes field using tab characters.A "code" field can also be added if needed (for example, COFECHA output extracts).The metadata should be kept succinct and not include unnecessary detail published or deposited elsewhere.Some of the information in the metadata is also held in a more structured way in the main database structure.

Attachments
In addition to the text information included in the structured database, attached files can also be included.These are mostly intended to be figures referred to in the metadata but can consist of longer datafiles and pdf reports if essential.It is however preferable that such information is published elsewhere and only referenced/linked in the IntCal database itself.Such attachments should not be seen as an alternative to the provision of structured information within the database.

Other Data
The calendar age correlation or covariance matrices, for those records with timescales that have been obtained using approaches that introduce a level of dependence between the calendar age estimates, are the main other data required for the IntCal calibration curve generation.These are stored in a particular IntCal_Correlation dataset which holds the matrix as a simple tab-delimited text matrix that can be read by standard software packages.See for example the SG06 record.This Lake Suigetsu record has an adjusted varve-counted chronology (Bronk Ramsey et al. 2020) whereby calendar age uncertainties at any individual depth within the core are propagated to the other depths (due to both the necessary depth ordering and limitations on changing sedimentation rates).
Age-depth models, proxy data and other types of dating information can also be included as outlined in Bronk Ramsey et al. (2019).

TOOLS FOR DATA VISUALIZATION
In addition to allowing the data to be explored, the IntChron integration tool user interface has functions that enable the data to be plotted and visualised in several different ways.

Mapping of Records
The first of the display methods is a mapping interface.This allows the records (either as a whole, or a selected subset) to be shown on a map (as in Figure 1).The map is dynamically linked, so hovering over the points will give their site name, and clicking on them will bring up the relevant record and associated data.A single site can also be mapped to check its location.

Plotting Radiocarbon Data
The 14 C calibration data can be plotted in several different ways.To select data to be plotted, you can either navigate to the records and add them to an accumulating plot, or you can use the top-level [Plot] function to select a period and the types of data that are to be included.The plotting can be against any of the main time-scale measures, overlay the appropriate calibration curves, and can use 14 C age, F 14 C or age-corrected Δ 14 C as the plotted value.All errors are stored, displayed and plotted at 1σ.The plotting routines collate the associated publications, so selecting the [Cite] option for a plot will give a reference listing.

Plotting of Dendrochronological Data
It is also possible to plot the tree ring data included in the database using either the raw ring widths or filtered values.This is most useful when there are multiple sets from the same chronology or the user imports additional series either from the ITRDB or by uploading files.

TOOLS FOR DATA PREPARATION AND ASSESSMENT
One of the main reasons for the choice of the underlying data model for the IntCal database was to make the task of preparing, adding, and assessing data easier for both submitters of data and those involved in the compilation.Clearly, the central database itself cannot be open for modification, but by enabling users to make their own copies of the database, this allows additions and changes to be tested with full access to the associated tools.It is also hoped that by enabling this, data providers (who normally understand their data best) will be able to get the data ready for submission themselves.
There is a help facility within the software, available through the [Help] menu, and this contains specific information for IntCal data which will be kept up-to-date with any developments in the interface.Dedicated functions to extract values from the intcal20.jsonfile have also been added to the rintcal R package, available on R's CRAN repository (through the R command install.packages('rintcal')).
This paper aims to indicate what is possible without being a complete guide as to how it is done.Workshops and recorded videos will be made available to explain this in further detail.

Creation of New Records
Within the IntChron framework, creating a new record is simply done by selecting the [New record] option; this will prompt the user for a record name and then allow all the main record data fields to be filled in (see Table 1 or examples in the existing archive).The location of the site can be checked on a map using the [Map] option.

Creation of New Data Series
Once the record has been created new data series can be added by selecting "Add to data series"; again the user will be prompted for a series name and can then choose the type of data (typically IntCal_Data or Data for this purpose).An [Import/Export] function allows data to be imported from or exported to a spreadsheet (see Figure 2).Working with Dendrochronological Data Dendrochronological data can be most easily added in one of two ways.Within the record using [File > Import] will bring up the option of importing "Dendro" data (which will allow Heidelberg, Tucson or some other formats to be imported) or "NOAA NCEI study" which allows importing of a study already available within the ITRDB.The latter option will first add a link to the study, and then it is possible to import the raw ring data for the relevant sample from that link.
The main advantages of having raw ring width data within the database are twofold.Firstly, it allows the 14 C data to be directly related to the ring width data by the shared number of the rings; if the dendrochronology is ever revised, this will allow for the correction of the dataset.Secondly, it enables the chronology of the dendrochronological series to be directly related to the 14 C dataset.Suppose the dated tree-ring series is included in the database and the 14 C samples have the ring numbers listed.In that case, it is possible to set the calage and t parameters of the 14 C series directly without having to work these out independently.This should avoid problems arising with different conventions, timescales and growing seasons, all of which can be handled within the interface.Even if this is not used directly, it is still possible to check the dendrochronological and 14 C series against one another, as in Figure 3.Such a plot has proved to be very useful for checking for internal consistency between dendrochronological reports and 14 C datasets.

INTCAL WORKFLOW
Figure 4 shows the intended workflow for using the IntCal database.Users will typically start with the static archive, add their own data and check them using the visualization tools provided.Once checked, they would then send their data for possible inclusion in the main IntCal database and use for calibration curve generation.

Making Partial Copies of the Archive
Starting with the online archive (see above), users can save the whole archive or parts of it.Assuming they only wish to save part of it, they can select records either one by one-or most easily by using the plotting function described above: 1. Use the [Plot] option to select the time range and sample type.2. Use the [Edit > Deselect all] to deselect the whole archive.
3. Use the [Select] option in the plotter to select all plotted records.
The user will then have the subset of records required.They can then save this in one of two ways.They can create their own project on the IntChron server by using [File > Save as] and then give a project name, or they can use [File > Download] to download the data (choosing only the selected data).The downloaded file can be uploaded into another project running on the server or the user's computer.

Addition of New Data
Adding new data involves the creation of the record (as described above) and then the input of the key data.This will always include IntCal_Data and if relevant Dendro_Sample and

Submission of Data to the IntCal Group
The new record and dataset can then be sent to the relevant members of the IntCal group by using the [File > Download] option under the new record.This will download a JSON file for that record which can be considered for inclusion in the central database once all the appropriate checks have been made.

CONCLUSIONS
It is hoped that the provision of this comprehensive database for the IntCal project will facilitate the work of the 14 C research community wishing to use the IntCal data and those wishing to contribute to it.The intention is to make the tools and all of the data used within the group open for all researchers.We consider it particularly important that all of the extensive work which underlies the datasets is fully referenced and that these references are easy to use for those accessing the data.Work is underway to prepare for the next update of the IntCal calibration curves, and loading data into this updatable format is intended to be the first stage in that process.

Figure 1
Figure 1 Example screenshot of the database interface showing (from top left, clockwise) windows containing: the list of records, a map of the records in part of Europe, the project data series, and the data series for the "Binz" record (showing the different data types included in this case).

Figure 3 Figure 4
Figure3Example of a plot of ring number against age for sample HKN-1 from Hakone town, Kanagawa, Japan.The x-axis is the dendrochronological age (error bars are the span) and the y-axis is the ring number (error bars are the range).The blue line shows the dendrochronological sequence and the Int_65_2 and Int_65_3 datasets are two radiocarbon datasets measured at different resolutions.In this case ring number goes from young to old and the coincidence of the line with the datapoints shows that the radiocarbon and dendrochronological datasets are consistent.(Please see online version for color figures.) Development of the IntCal Database 3 https://doi.org/10.1017/RDC.2023.53Published online by Cambridge University Press

Table 1
Main parameter names with special meaning within the IntCal database, organized by data table.this should typically be the tree sample code and ring number as in ABCD01_r3 or ABCD01_r11-19.In all cases the sample name should be unique to the sample itself within that record and shared by samples measured by different labs.The ages of samples with the same name in the same record are defined to be the same age.
parameters are used for this record.Typically here, "rings" for dendrochronological samples, "depth" for sediments, or "height" for speleothems.z_unitszunits Most relevant for depths and heights where the data is stored in m but can be displayed in "m", "cm", or "mm".Set to "rings" for dendrochronology.taxonTaxonGenus and where possible species of tree used for dendrochronological samples.18Oused instead of or in addition to ring_width.Development of the IntCal Database 9 Development of the IntCal Database 11 https://doi.org/10.1017/RDC.2023.53Published online by Cambridge University Press

Table 2
Main topics covered in dendrochronological metadata included where this information is not otherwise available in linked publications, and where this information is duplicated elsewhere in the database.