Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-06T08:49:20.759Z Has data issue: false hasContentIssue false

Protein Data Bank (PDB): Fifty-three years young and having a transformative impact on science and society

Published online by Cambridge University Press:  20 February 2025

Helen M. Berman*
Affiliation:
Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Stephen K. Burley*
Affiliation:
Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA Rutgers Cancer Institute, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA Rutgers Artificial Intelligence and Data Science (RAD) Collaboratory, Rutgers, The State University of New Jersey, Piscataway, NJ, USA Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA
*
Corresponding authors: Helen M. Berman and Stephen K. Burley; Emails: berman@rcsb.rutgers.edu; stephen.burley@rcsb.org
Corresponding authors: Helen M. Berman and Stephen K. Burley; Emails: berman@rcsb.rutgers.edu; stephen.burley@rcsb.org
Rights & Permissions [Opens in a new window]

Abstract

This review article describes the co-evolution of structural biology as a discipline and the Protein Data Bank (PDB), established in 1971 as the first open-access data resource in biology by like-minded structural scientists. As the PDB archive grew in size and scope to encompass macromolecular crystallography, NMR spectroscopy, and cryo-electron microscopy, new technologies were developed to ingest, validate, curate, store, and distribute the information. Community engagement ensured that the needs of structural biologists (data depositors) and data consumers were met. Today, the archive houses more than 230,000 experimentally determined structures of proteins, nucleic acids, and macromolecular machines and their complexes with one another and small-molecule ligands. Aggregate costs of PDB data preservation are ~1% of the cost of structure determination. The enormous impact of PDB data on basic and applied research and education across the natural and medical sciences is presented and highlighted with illustrative examples. Enablement of de novo protein structure prediction (AlphaFold2, RoseTTAfold, OpenFold, etc.) is the most widely appreciated benefit of having a corpus of rigorously validated, expertly curated 3D biostructure data.

Information

Type
Review
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Early structures in the PDB: (a) Oxygen carrying; (b) enzymes. (c) Electron transport. Images from Molecule of the Month: PDB Pioneers (Goodsell 2011).

Figure 1

Figure 2. Overall growth of structures released in the PDB archive (https://www.rcsb.org/stats).

Figure 2

Figure 3. Structure determination pipelines for (a) MX, (b)NMR, and (c) 3DEM. Figure from https://pdb101.rcsb.org/learn/pdb-and-data-archiving-curriculum/about/ (Lawson et al.2018).

Figure 3

Figure 4. MX structure of the nucleosome core particle PDB ID 1aoi (Luger et al.1997). Image from the Molecule of the Month (Goodsell, 2000).

Figure 4

Figure 5. SARS-CoV-2 Genome and Proteome Organization. Near complete 3D knowledge of the SARS-CoV-2 proteome derives from >4,600 SARS-CoV-2 related PDB structures and CSM based on SARS-CoV-1 related structures archived in the PDB. Figure adapted from (Lubin et al., 2022) and available from PDB-101 (https://pdb101.rcsb.org/learn/flyers-posters- and-calendars/flyer/sars-cov-2-genome-and-proteins). Color coding: shades of blue-non-structural proteins; shades of green: structural proteins and proteins encoded by various open-reading frames; yellow/orange/red-duplex RNA; orange-S-adenosyl methionine; and shades of red-enzyme inhibitors.

Figure 5

Figure 6. Ribbon representation of the co-crystal structure of sotorasib covalently bound to the G12C KRAS (pink)/GDP complex (PDB ID 6oim (Canon et al., 2019)). Inset highlights a zoomed-in view of the sotorasib binding site, showing the covalent bond (half green/half yellow) between the drug and Cysteine 12 (yellow atomic ball-and-stick figure). Images generated using the Mol* Viewer (Sehnal et al., 2021). Image adapted from (Burley et al., 2024).

Figure 6

Figure 7. RCSB.org delivers PDB experimental structures (identified with an Erlenmeyer flask icon in dark blue) and CSMs (computer screen icon in cyan) from AI/ML that can be searched, analyzed, visualized, and explored using custom tools and features. Image from (Burley et al., 2023).