Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-09T21:30:09.253Z Has data issue: false hasContentIssue false

Moment-based metrics for molecules computable from cryogenic electron microscopy images

Published online by Cambridge University Press:  23 February 2024

Andy Zhang*
Affiliation:
Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
Oscar Mickelin
Affiliation:
Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
Joe Kileel
Affiliation:
Department of Mathematics and Oden Institute, University of Texas at Austin, Austin, TX, USA
Eric J. Verbeke
Affiliation:
Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
Nicholas F. Marshall
Affiliation:
Department of Mathematics, Oregon State University, Corvallis, OR, USA
Marc Aurèle Gilles
Affiliation:
Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
Amit Singer
Affiliation:
Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA Department of Mathematics, Princeton University, Princeton, NJ, USA
*
Corresponding author: Andy Zhang; Email: az8940@princeton.edu
Rights & Permissions [Opens in a new window]

Abstract

Single-particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution three-dimensional (3D) structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3D reconstruction, known as Kam’s method, relies on the moments of the two-dimensional (2D) images. Inspired by Kam’s method, we introduce a rotationally invariant metric between two molecular structures, which does not require 3D alignment. Further, we introduce a metric between a stack of projection images and a molecular structure, which is invariant to rotations and reflections and does not require performing 3D reconstruction. Additionally, the latter metric does not assume a uniform distribution of viewing angles. We demonstrate the uses of the new metrics on synthetic and experimental datasets, highlighting their ability to measure structural similarity.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Comparison between $ {d}_{\mathrm{vKam}} $, the Zernike metric, and Euclidean alignment. a-c) A random size 100 subset of the database is selected. Then, pairwise similarity metrics are calculated and plotted, where each point represents a pair of structures. The NDCG score is calculated using the metric on the $ y $-axis as the predicted metric, and the metric on the $ x $-axis as the true metric. d) The procedure is repeated with 10 randomly selected size 100 subsets, and the mean ($ \mu $) and standard deviation ($ \sigma $) of the NDCG scores are calculated. The error bars and points visualize $ \mu \pm \sigma $.

Figure 1

Figure 2. 2D embedding of protein structures based on their similarity using $ {d}_{\mathrm{vKam}} $. The analytical moments of 1420 proteins were computed and compared using (6), and t-SNE was applied for visualization. Each node represents a single structure and is colored by the number of atoms. Distinct clusters containing homologous or similarly shaped structures suggest that $ {d}_{\mathrm{vKam}} $ provides interpretable results.

Figure 2

Figure 3. Visualization of the generation of simulated images. (a) Protein structure of PDB-7VV3. (b) Clean projection images from PDB-7VV3 generated with a nonuniform viewing angle distribution. (c) Projection images corrupted with a CTF and white noise with $ SNR=0.1 $. (d) Distribution of nonuniform viewing angles.

Figure 3

Figure 4. Histogram ranking of dissimilarities computed using $ {d}_{\mathrm{iKam}} $ on simulated noisy projection images generated from PDB-7VV3.

Figure 4

Figure 5. Comparison between the rankings given by $ {d}_{\mathrm{iKam}} $ (computed from simulated images) and the minimum Euclidean distance after alignment (computed from volumes). The structures shown are superimposed with the ground truth after alignment in panels (a)–(d). The points on the graph that correspond to these structures are colored and labeled. The ground truth corresponds to the green cross in the lower left.Note that the Euclidean alignment metric shows stagnation whereas Kam’s metric does not.

Figure 5

Figure 6. Comparison between $ {d}_{\mathrm{iKam}} $, computed from simulated images, and the Zernike metric, computed from volumes. Here, we repeat simulated experiments 100 times. Then, the size of the intersection of the top ten structures returned by $ {d}_{\mathrm{iKam}} $ and the Zernike metric is plotted as a histogram.

Figure 6

Figure 7. $ {d}_{\mathrm{iKam}} $ visualization and ranking results for experimental data corresponding to structure 001 (a) Experimental images from EMPIAR-10076 corresponding to structure 001 downsampled to $ 64\times 64 $ pixels, centered, and with binary mask applied. (b) Comparison between diagonal entries of the second moment computed from the reconstructed volumes 001 and 002 and the moment estimated from experimental images corresponding to structure 001. (c) Comparison between diagonal entries of the second moment computed from the reconstructed volumes 003 and 004 and the moment estimated from experimental images corresponding to structure 001. (d) The five reconstructions (000–004) and two baseline structures (EMD-8457 and EMD-2600) ranked using $ {d}_{\mathrm{iKam}} $, ordered from left to right.

Figure 7

Figure 8. Visualization of $ \log \left({d}_{\mathrm{vKam}}\right) $ and $ \log \left({d}_{\mathrm{iKam}}\right) $ values on the seven candidate structures. Here, EMD-8457 and EMD-2660 are listed as 8457 and 2660 for brevity. Note that there are five ground truth structures but seven candidate structures since EMD-2660 and EMD-8457 are baseline structures for which there are no images in the experimental dataset.

Figure 8

Table B1. Effect of the value of the hyperparameter $ \lambda $ on the ranking induced by $ {d}_{\mathrm{iKam}} $

Figure 9

Figure B1. Additional t-SNE plots. (a) t-SNE plot of pairwise Euclidean alignment distances on a subset of size 100. (b) t-SNE plot of pairwise Zernike distances on the entire database. (c) t-SNE plot of pairwise $ {d}_{\mathrm{vKam}} $ distances on the entire database.

Figure 10

Table B2. Effect of the number of projection images used for moment estimation on the ranking induced by$ {d}_{\mathrm{iKam}} $

Figure 11

Table B3. Performance of $ {d}_{\mathrm{iKam}} $ on structures 001, 002, 003, 004, and 005 of EMPIAR-10076