Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties

24 November 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Mass spectrometry (MS) generates large datasets that are stored in increasingly optimized and complex file types, demanding technical expertise to extract information rapidly and easily. We wondered whether a simple structured query language (SQL) database could hold raw MS data and allow for easily readable queries without incurring major penalties in read time or disk space relative to other popular MS formats. Here, we describe a basic MS schema with intuitive database tables and fields that can outperform other formats for exploratory and interactive analysis according to six data subsets commonly extracted: single scans (both MS1 and MS2), ion chromatograms, retention time ranges, and fragmentation searches (both precursor and fragment search). Additionally, we compare SQLite, DuckDB, and Parquet implementations and find that they can perform these tasks in under a second even when the files occupy over a gigabyte of data on disk. We believe that this tidy data schema expands nicely to most forms of MS data and offers a way to transparently query datasets while preserving computational performance.

Keywords

mass spectrometry
data storage
SQL
benchmarking
liquid chromatography
human-centered design
exploratory data analysis

Supplementary materials

Title
Description
Actions
Title
Combined SI
Description
Combined supplemental information for the manuscript "Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties"
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.