The SISAL webApp: exploring the speleothem climate and environmental archives of the world

Abstract We present the ‘SISAL webApp’—a web-based tool to query the Speleothem Isotope Synthesis and AnaLysis (SISAL) database. The software provides an easy-to-use front-end interface to mine data from the SISAL database while providing the SQL code alongside as a learning tool. It allows for simple and increasingly complex querying of the SISAL database based on various data and metadata fields. The SISAL webApp version currently hosts SISALv2 of the database with 691 records from 294 sites, 512 of which have standardized chronologies. The SISAL webApp has sufficient flexibility to host future versions of the SISAL database, which may include allied speleothem information such as trace elements and cave-monitoring records. The SISAL webApp will increase accessibility to the SISAL database while also functioning as a learning tool for more advanced ways of querying paleoclimate databases. The SISAL webApp is available at http://geochem.hu/SISAL_webApp.


INTRODUCTION
Natural sedimentary archives are globally distributed and are valuable for local-to continental-scale assessment of past climate and environmental changes because they record the environmental conditions that prevailed during their formation (Williamson et al., 2009;Birks et al., 2014).Speleothems are natural sedimentary archives formed of calcium carbonate and preserved in caves.Speleothems often have excellent age control (Fairchild and Baker, 2012), are widely distributed in terrestrial regions around the world, and provide high-resolution records of past changes in climate and environment that are encoded mainly in carbon and oxygen isotopes (McDermott, 2004;Wong and Breecker, 2015) and trace elements (Fairchild and Treble, 2009).SISAL (Speleothem Isotope Synthesis and AnaLysis) is a working group of the Past Global Changes (PAGES) project with the goal to provide a comprehensive compilation of speleothem records for climate reconstruction and model evaluation (Comas-Bru and Harrison, 2019).
The first version of the SISAL database, SISALv1 (Atsawawaranunt et al., 2018), contained 381 speleothem records from 174 cave sites and was supported by publications on ways to use these data for data-model comparisons (Comas-Bru and Harrison, 2019) and regional interpretations of the isotopic records in the database (Kaushal et al., 2018;Lechleitner et al., 2018;Braun et al., 2019;Burstyn et al., 2019;Deininger et al., 2019;Kern et al., 2019;Oster et al., 2019).The last published version of the database, SISALv2, encompassed 691 speleothem records from 294 sites (Comas-Bru et al., 2020a) and provided additional standardized chronologies that are essential for better age control, which is required for analysis based on multiple speleothem records (Bühler et al., 2022).Regional and/or temporal subsets of the SISALv2 database provided essential data for evaluating environmental response to climate events in time (Kukla et al., 2021;Parker and Harrison, 2022), evolution of climate phenomena in a spatial domain (Parker et al., 2021;Gorenstein et al., 2022), datamodel comparisons (Bühler et al., 2021(Bühler et al., , 2022)), improved interpretations of speleothem data (Treble et al., 2022), or improved assessment of the robustness of spectral analysis of unequally spaced sedimentary proxies with chronological uncertainty (Hatvani et al., 2022).A third version of the database, SISALv3, is currently being compiled and will be made available to the public in 2023.The new database will also contain Mg/Ca, Sr/Ca, Ba/Ca, U/Ca, P/Ca, and Sr isotope records (Kaushal et al., 2023).With this new version of the database, it will be possible to explore the global significance of trace-element signatures in speleothems systematically, and to refine climatic interpretations gained from the stable isotope records.
While the SISAL database is clearly useful for the climate community, additional applications and new scientific results can be achieved by effectively facilitating broader access to the database (Kaushal et al., 2021).At the moment, the SISAL database is hosted as an SQL file or multiple *.csv files that are linked together by identification numbers.These formats require the use of software such as MySQL, R, Python, or MATLAB to query the database.In addition, there has been a tremendous increase in the number of paleoclimate databases and in their use for research purposes over the last 10 years (Sundqvist et al., 2014;PAGES2k Consortium, 2017;Konecky et al., 2020;Kukla et al., 2021).Because researchers need to learn how to use such databases to address scientific questions at hand, and further how to appropriately query databases to get accurate datasets for analysis, we created the SISAL webApp in order to fill these knowledge gaps.
The paper aims to (1) describe the architecture of the SISAL webApp, (2) provide instructions on the logic by which databases can be mined for required data, and (3) provide accompanying SQL output code the user can build on the basic functionalities of the SISAL webApp by using a tool like MySQL to directly mine the database.

DESIGN
The SISAL webApp's architecture is based on the SISALv2 SQL database (Comas-Bru et al., 2020b).The SISAL Query Server, written in JavaScript using the node.jscode library, is responsible for the database operations.The web interface accessible to users is served by the SISAL Query Client, which also is written in JavaScript and relies on the React.jscode library.This architecture allows users to query the SISALv2 database using the most popular web browsers supporting ES5 methods (e.g., Google Chrome, Safari, Microsoft Edge, Mozilla Firefox) by specifying a few parameters.The graphical user interface created allows this without the user having to generate SQL code (Fig. 1).The SISALv2 database, the SISAL Query Server, and the SISAL Query Client are hosted on the severs of the Research Centre for Astronomy and Earth Sciences, Eötvös Loránd Research Network, Budapest, Hungary.We plan an update of the SISAL webApp to the SISALv3 database after its release.

FEATURES
The SISAL webApp offers two types of querying.Basic querying, in which the SISALv2 database can be explored based on site name, geographical information, and/or temporal constrains.
Basic querying provides the user with the metadata of the queried speleothem records, their sample data (δ 18 O and δ 13 C), and dating information, both the original author-generated chronology and seven SISAL-generated standardized chronologies (lin_interp_age, lin_reg_age, Bchron_age, Bacon_age, OxCal_age, copRa_age, StalAge_age) derived by different approaches for age-depth modeling (Amirnezhad-Mozhdehi and Comas-Bru, 2019;Comas-Bru et al., 2019;Rehfeld et al., 2020).The second type of querying supported by the SISAL webApp is the Advanced querying option through which database information can be extracted based on number of available radiometric ages and sample data resolution.
As an additional feature, the SISAL webApp provides SQL codes to help the user get a deeper insight into how the database is queried.It is our intention to make the webApp a steppingstone in the usage of the SISAL database and other databases like it.

Basic querying
Step 1: determining the basic spatiotemporal constraints Basic querying provides the tool to extract SISAL database information based on the most fundamental filters.After providing an email address (recommended for query logging purposes) the user can choose to query based on the name of the cave site or within spatial (e.g., latitude and longitude limits) and/or within temporal constraints (interp_age).At least one of the following "Filter types" must be correctly filled out.
Filter type 2. Latitude and longitude (from-to; default is global coverage values from −90°to 90°and from −180°to 180°).In the first column, the southern and western boundaries should be provided for latitude and longitude, respectively.Latitude in degrees decimal (N: +; S: −) and longitude in degrees decimal (E: +; W: −).Other formats are not accepted, in which case the SISAL webApp will return no results.If asking for global coverage, only the metadata are made available through the SISAL webApp.A secure feature does not allow the user to overload the server, therefore download is limited to 30,000 rows.
Filter type 3. interp_age (interpolated age from youngerolder) according to the original author-generated age model expressed in years BP, where BP refers to "before present," (present = AD 1950).For details see Atsawawaranunt et al. (2018, table S9) and the SISAL repository at University of Reading.
This first querying step will return a list of speleothem records, their site and entity metadata fulfilling the query criteria provided.The sites with the queried entities are shown on a map (OpenStreetMap contributors, 2023); clicking on the markers shows information on site_name, site_id, geology, and rock_age.
In addition, the user can choose from querying (1) only non-composite records (tick 'Non-Composite' checkbox), (2) only composite records (tick 'Composite' check-box), or (3) both non-composite and composite records (tick both 'Non-Composite' and 'Composite' check-boxes).The default is set to only non-composite records.
Step 2: selection based on the metadata All records fulfilling the criteria set in Step 1 can subsequently be selected by checking the box next to the doi column header.Alternatively, the user can specify a subset of data to be extracted, based on criteria specified in the metadata (e.g., mineralogy; Fig. 3).
The original author-generated chronology is a default output, and the user of the SISAL webApp has to choose at least one SISAL chronology to be extracted for the queried record(s) under the 'Select chronos' section (Fig. 3).The alternative agedepth models (SISAL chronologies with corresponding uncertainties) were provided by SISAL (Amirnezhad-Mozhdehi and Comas-Bru, 2019; Rehfeld et al., 2020) for records that are not composites (i.e., time series based on more than one speleothem record) and which are 230 Th/U dated (see Comas-Bru et al., 2020b).All the SISAL chronologies can be selected with the 'Select all' checkbox, or in any combination, for example asking only for the Bacon_age SISAL chronology (Fig. 3).
Step 3: data extraction By pressing the blue download buttons at the bottom of the page the user can download the (1) metadata of the selected records (called EntityList.xlsx), (2) their dating information (called DatingInfo.xlsx), and/or (3) selected chronologies and the sample data (called SampleData.xlsx) in three separate files.In addition, the SQL codes are provided in a worksheet called 'SQL query' in each output file to help the user get a deeper insight into how the database is queried.
The extracted sample data are trimmed according to the temporal constraints if applied in 'Filter type 3 (interp_age)', but the complete dating information table is given for the selected record (s).Note that interp_age (original author-generated age model expressed in years BP, where BP refers to "before present" with the present being AD 1950) is provided by default in the fifth column of the output file SampleData.xlsx.

Advanced querying
Advanced querying provides tools to extract SISAL database information based on the number of available radiometric ages and sample data resolution.In descriptive words, this option would be suitable for a query like "extract all data from Asia, covering the last 2,000 years BP, where each record has at least two radiometric age measurements over the 2,000 year period, and the sampling interval for isotopes is less than ten years between successive samples."This option is available after Basic filtering when a corresponding list of records are received and selected.Note that if no temporal constraint is applied in the Basic querying, the Advanced query will search within full temporal coverage of the selected records.
In advanced querying, two filters can be applied and combined: (1) minimum number of radiometric ages for the chosen record(s) regarding the whole available time interval, or shorter if a filter is applied in the Basic querying part: first step (note that ages excluded by the original authors to develop the agedepth models [i.e., where date_used = no in the database] are not considered); and (2) maximum allowed 'age gap' given as number of years in the original chronology (interp_age), or in another chosen sisal_chronology, considering the whole available time interval, or shorter if a filter is applied in the Basic querying part: first step (i.e., a large age gap means either the sample resolution is coarse and/or the estimated duration of any hiatus in the record exceeds the given maximum allowed 'age gap' in the queried interval).
In an example, the Middle Holocene (5500-6500 yr BP) was queried globally (Fig. 4A), which provided 178 records.When the 'Advanced query filter 1' was chosen and at least three radiometric ages were required (Fig. 4B) from each entity from within the Middle Holocene, the number of records decreased to 30, with 6065 lines of sample data altogether.This querying took 23 seconds for the server to finish.When the advanced constraints were made stricter with only a maximum of 100 years allowed between consecutive sample dates in interp_age (interpolated age based on original_chronology) to exclude the coarse resolution records (Fig. 4C), the number of obtained records dropped to 15, with the querying taking 24 seconds.When only 'Advanced query filter 2' is used (Fig. 4D) the output is 66 entities with 6332 lines of sample data in 25 seconds.
In all cases the output table (called advancedRes.xlsx)consists of six worksheets: (1) reportInfo with the SQL code of the selection of the records (Step 2; Fig. 3); (2) entityAdvFiltered with the list of records meeting the advanced criteria provided; (3) datingAdvFiltered with the corresponding radiometric ages; (4) chronoAdvFiltered with the sample data and the chosen SISAL chronology (in this case copRa_age; Fig. 4B-D); (5) SQLsAdvFiltered with the SQL code of the advanced querying; and (6) 'entity,' 'dating,' 'chrono,' and 'SQLs' worksheets with the list of entities, radiometric ages, sample data, and SQL code, respectively, provided by the basic querying (Fig. 4A).

COMMON ERROR MESSAGES AND THEIR BACKGROUND/ SOLUTION
Most common errors are associated with basic querying.If no field is completed, the SISAL webApp will return: "None of the query's filter parameters are specified correctly!Please specify the site_name and/or Lat-Lon coordinates and/or interp_age interval and try again!".
In general, the user should pay attention to using the proper formatting of the spatial constraints and use decimal degree units, otherwise the SISAL webApp will return the message: "The coordinates are incorrect or some are missing!Please revise the coordinates, and try again!Default is global coverage from −90°to 90°and from −180°to 180°." Secondly, the temporal constraints should follow the instructions given in this paper and the user manual (http://geochem.hu/SISAL_webApp).If the interp_age is provided incorrectly, the following error message is given: "The interp_age interval is incorrect or incomplete!Please revise the beginning (younger) and end (older) of the interval, and try again!".
Advanced querying does not work without the Basic querying part being used and at least one entity selected in Step 2; otherwise, the following error message is returned: "Download request denied!Please select at least one entity!".
Please note, in case of large output tables (maximum 30,000 rows), querying may take up to minutes.

SOFTWARE AVAILABILITY, FUTURE UPDATES, AND TERMS OF USE
The SISAL webApp is available at the Research Centre for Astronomy and Earth Sciences (http://geochem.hu/SISAL_webApp), and has been tested to work in all major browsers (e.g., Google Chrome, Safari, Microsoft Edge, Mozilla Firefox).SISAL is continuing to expand the global database by including new records and extended sets of data and metadata information.The SISAL webApp is intended to be able to perform more advanced querying on the most updated version of the SISAL database.
Planned updates include (1) the option to allow the user to pick the annual laminated samples; (2) replacing the current map output with an interactive alternative, that would allow the selection of the area of interest, while showing the cave sites and records; and (3) making the SISAL webApp capable of providing information on the mean sample resolution and mean chronological uncertainty, providing input data for immediate verification of inherent constraints for spectral analyses (e.g., CUSP (https://geochem.hu/CUSP/);Hatvani et al., 2022) of the considered speleothem record(s).
If using individual sites, the literature citations for published work provided in the database also should be cited.Contact information of data contributors of unpublished data is also provided, and these data contributors should be contacted when unpublished records are used on an individual basis.In addition, users are advised to verify which queried reference reports the particular record downloaded, because for example, oxygen and carbon stable isotope records from the same stalagmite may be published in different papers.

Figure 1 .
Figure 1.Architecture of the SISAL webApp.

Figure 2 .
Figure 2. Basic querying first step illustrating the query options.Only Filter type 2 is used determining the geographical constraints as Lat: −10°to 10°and Lon: −50°to 15°.Note that temporal constraints (Filter type 3) must be used if one aims to use the advanced query options later.

Figure 3 .
Figure 3. Basic querying output information and selection of SISAL chronology/chronologies.The example shows the output using the setting shown in Figure 2. The entity_name Abissal (entity_id = 79) and the entity_name RN4 (entity_id = 220) speleothem records are selected with the Bacon_age SISAL chronology.

Figure 4 .
Figure 4. Advanced querying examples.Temporal constraints are applied in the basic querying (A).Advanced query filter 1 (B), filter 2 (C) and both advanced query filters (D) are applied with the copRa_age SISAL chronology chosen to accompany the sample data in the advanced query output.