Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-08T11:41:50.130Z Has data issue: false hasContentIssue false

PaleoRec: A sequential recommender system for the annotation of paleoclimate datasets

Published online by Cambridge University Press:  13 April 2022

Shravya Manety
Affiliation:
Information Sciences Institute, University of Southern California, Marina Del Rey, California, USA
Deborah Khider*
Affiliation:
Information Sciences Institute, University of Southern California, Marina Del Rey, California, USA
Christopher Heiser
Affiliation:
School of Earth and Sustainability, Northern Arizona University, Flagstaff, Arizona, USA
Nicholas McKay
Affiliation:
School of Earth and Sustainability, Northern Arizona University, Flagstaff, Arizona, USA
Julien Emile-Geay
Affiliation:
Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
Cody Routson
Affiliation:
School of Earth and Sustainability, Northern Arizona University, Flagstaff, Arizona, USA
*
*Corresponding author. E-mail: khider@usc.edu

Abstract

Studying past climate variability is fundamental to our understanding of current changes. In the era of Big Data, the value of paleoclimate information critically depends on our ability to analyze large volume of data, which itself hinges on standardization. Standardization also ensures that these datasets are more Findable, Accessible, Interoperable, and Reusable. Building upon efforts from the paleoclimate community to standardize the format, terminology, and reporting of paleoclimate data, this article describes PaleoRec, a recommender system for the annotation of such datasets. The goal is to assist scientists in the annotation task by reducing and ranking relevant entries in a drop-down menu. Scientists can either choose the best option for their metadata or enter the appropriate information manually. PaleoRec aims to reduce the time to science while ensuring adherence to community standards. PaleoRec is a type of sequential recommender system based on a recurrent neural network that takes into consideration the short-term interest of a user in a particular dataset. The model was developed using 1996 expert-annotated datasets, resulting in 6,512 sequences. The performance of the algorithm, as measured by the Hit Ratio, varies between 0.7 and 1.0. PaleoRec is currently deployed on a web interface used for the annotation of paleoclimate datasets using emerging community standards.

Information

Type
Data Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Figure 1. Behavior sequences for PaleoRec. PaleoRec is an example of a transaction-based recommender system in which we model the same type of user behavior (select) for different items along a chain defined by the LinkedEarth Ontology.

Figure 1

Figure 2. Architecture of PaleoRec.

Figure 2

Figure 3. Number of datasets with each of the archiveType present in the compilations. The category others lumps archiveType represented less than six times in the datasets.

Figure 3

Figure 4. Number of datasets with each of the proxyObservationType present in the compilations. The category others lumps proxyObservationType represented less than six times in the datasets.

Figure 4

Figure 5. Number of datasets with each of the inferredVariableType present in the compilations. The category others lumps inferredVariableType represented less than six times in the datasets.

Figure 5

Figure 6. Evaluation metrics: (a) hit ratio and (b) mean reciprocal rank (MRR) for recommendations set size of $ k=\left\{\mathrm{3,5,7,10,12,14,16}\right\} $ for the various metadata fields.

Figure 6

Figure 7. Performance comparison between a gated recurrent units (GRU) and an long-short term memory (LSTM) layer: Left: Hit Ratio for recommendations set size of $ k=\left\{\mathrm{3,5,7,10,12,14,16}\right\} $ for a model using an LSTM layer (top) and a GRU layer (bottom). Right: Same as left for the MMR.

Figure 7

Figure 8. Performance comparison between PaleoRec with and without user information: left: hit ratio for recommendations set size of $ k=\left\{\mathrm{3,5,7,10,12,14,16}\right\} $ for a model using without user representation (top) and with user representation (bottom). Right: Same as left for the mean reciprocal rank (MRR).