Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties

William Kumler; Sam LaRue; Anitra Ingalls

doi:10.26434/chemrxiv-2025-3kzff

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties

24 November 2025, Version 1

Working Paper

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Mass spectrometry (MS) generates large datasets that are stored in increasingly optimized and complex file types, demanding technical expertise to extract information rapidly and easily. We wondered whether a simple structured query language (SQL) database could hold raw MS data and allow for easily readable queries without incurring major penalties in read time or disk space relative to other popular MS formats. Here, we describe a basic MS schema with intuitive database tables and fields that can outperform other formats for exploratory and interactive analysis according to six data subsets commonly extracted: single scans (both MS1 and MS2), ion chromatograms, retention time ranges, and fragmentation searches (both precursor and fragment search). Additionally, we compare SQLite, DuckDB, and Parquet implementations and find that they can perform these tasks in under a second even when the files occupy over a gigabyte of data on disk. We believe that this tidy data schema expands nicely to most forms of MS data and offers a way to transparently query datasets while preserving computational performance.

Keywords

liquid chromatography

human-centered design

exploratory data analysis

Supplementary materials

Title

Description

Actions

Title

Combined SI

Description

Combined supplemental information for the manuscript "Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties"

Actions

Supplementary weblinks

Title

Description

Actions

Title

Github repository (mzsql)

Description

Github repository containing data and code necessary for the manuscript "Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties"

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Nov 24, 2025 Version 1

Metrics

242

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-3kzff

Funding

Simons Foundation

385428

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share