Abstract
Mass spectrometry (MS) generates large datasets that are stored in increasingly optimized and complex file types, demanding technical expertise to extract information rapidly and easily. We wondered whether a simple structured query language (SQL) database could hold raw MS data and allow for easily readable queries without incurring major penalties in read time or disk space relative to other popular MS formats. Here, we describe a basic MS schema with intuitive database tables and fields that can outperform other formats for exploratory and interactive analysis according to six data subsets commonly extracted: single scans (both MS1 and MS2), ion chromatograms, retention time ranges, and fragmentation searches (both precursor and fragment search). Additionally, we compare SQLite, DuckDB, and Parquet implementations and find that they can perform these tasks in under a second even when the files occupy over a gigabyte of data on disk. We believe that this tidy data schema expands nicely to most forms of MS data and offers a way to transparently query datasets while preserving computational performance.
Supplementary materials
Title
Combined SI
Description
Combined supplemental information for the manuscript "Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties"
Actions
Supplementary weblinks
Title
Github repository (mzsql)
Description
Github repository containing data and code necessary for the manuscript "Storing mass-spectrometry data in simple databases enables flexible and intuitive exploration without time or space penalties"
Actions
View 


![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)