Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-07T02:01:45.180Z Has data issue: false hasContentIssue false

Fast principal component analysis for cryo-electron microscopy images

Published online by Cambridge University Press:  03 February 2023

Nicholas F. Marshall
Affiliation:
Department of Mathematics, Oregon State University, Corvallis, Oregon 97331, USA
Oscar Mickelin
Affiliation:
Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544, USA
Yunpeng Shi*
Affiliation:
Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544, USA
Amit Singer
Affiliation:
Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544, USA Department of Mathematics, Princeton University, Princeton, New Jersey 08544, USA
*
*Corresponding author. E-mail: yunpengs@princeton.edu
Rights & Permissions [Opens in a new window]

Abstract

Principal component analysis (PCA) plays an important role in the analysis of cryo-electron microscopy (cryo-EM) images for various tasks such as classification, denoising, compression, and ab initio modeling. We introduce a fast method for estimating a compressed representation of the 2-D covariance matrix of noisy cryo-EM projection images affected by radial point spread functions that enables fast PCA computation. Our method is based on a new algorithm for expanding images in the Fourier–Bessel basis (the harmonics on the disk), which provides a convenient way to handle the effect of the contrast transfer functions. For $ N $ images of size $ L\times L $, our method has time complexity $ O\left({NL}^3+{L}^4\right) $ and space complexity $ O\left({NL}^2+{L}^3\right) $. In contrast to previous work, these complexities are independent of the number of different contrast transfer functions of the images. We demonstrate our approach on synthetic and experimental data and show acceleration by factors of up to two orders of magnitude.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. We visualize $ {\log}_{10}\left({\sum}_{i=1}^N{\left|{\hat{h}}_i\left(|\xi |\right)\right|}^2|{\hat{h}}_i\Big(|\eta {\left|\Big)\right|}^2\right) $ for each pair of radial frequencies $ \mid \xi \mid, \mid \eta \mid $ for the experimental dataset EMPIAR-10028(18) obtained from the Electron Microscopy Public Image Archive(19). All values are greater than $ -1 $ in the log scale.

Figure 1

Table 1. Summary of desirable properties of a few different basis candidates.

Figure 2

Figure 2. Timing comparison for covariance matrix estimation of 10,000 images of size $ L\times L $. The old method(15) timing for $ L=512 $ and defocus count $ {10}^4 $ is extrapolated due to time and memory constraints.

Figure 3

Figure 3. Top six eigenimages computed by traditional PCA on $ {10}^6 $ clean images (top panel), our new method on $ {10}^4 $ raw images (middle panel), and traditional PCA on $ {10}^4 $ phase-flipped images (bottom panel). The signal-to-noise ratio for the images for the new method and traditional PCA was 0.1.

Figure 4

Figure 4. Relative estimation error of the covariance matrix (left) and the Fourier ring correlation between the denoised and clean images (right).

Figure 5

Figure 5. Clean, noisy and denoised images. The covariance estimation used $ N=10,000 $ images, and parameters $ L=128 $ and $ M=100 $.

Figure 6

Table 2. Timing comparison in seconds for EMPIAR-10028 (top) and EMPIAR-10081 (bottom).

Figure 7

Figure 6. Denoised images (EMPIAR-10028). The method used $ N=105,247 $ images, from $ M=1081 $ defocus groups, of size $ 360\times 360 $. The clean images are obtained by aligning $ 1000 $ clean projection images (from uniformly distributed viewing directions) with phase-flipped raw images.

Figure 8

Figure 7. Denoised images (EMPIAR-10081). The method used $ N=55,870 $ images, from $ M=53,384 $ defocus groups, of size $ 256\times 256 $. The clean images are obtained by aligning 1000 clean projection images (from uniformly distributed viewing directions) with phase-flipped raw images.