Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-05T21:27:30.366Z Has data issue: false hasContentIssue false

Scipion3: A workflow engine for cryo-electron microscopy image processing and structural biology

Published online by Cambridge University Press:  29 June 2023

Pablo Conesa*
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Yunior C. Fonseca
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Jorge Jiménez de la Morena
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Grigory Sharov
Affiliation:
Structural Studies Division, MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
Jose Miguel de la Rosa-Trevín
Affiliation:
St. Jude Children’s Research Hospital, Memphis, TN, USA
Ana Cuervo
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Alberto García Mena
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Borja Rodríguez de Francisco
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Daniel del Hoyo
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
David Herreros
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Daniel Marchan
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
David Strelak
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain Masaryk University, Brno, Czech Republic
Estrella Fernández-Giménez
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Erney Ramírez-Aportela
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Federico Pedro de Isidro-Gómez
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Irene Sánchez
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
James Krieger
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
José Luis Vilas
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Laura del Cano
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Marcos Gragera
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Mikel Iceta
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Marta Martínez
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Patricia Losana
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Roberto Melero
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Roberto Marabini
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain Superior Polytechnic School, Autonomous University of Madrid, Madrid, Spain
José María Carazo
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
Carlos Oscar Sánchez Sorzano
Affiliation:
National Center of Biotechnology (CNB-CSIC), Madrid, Spain
*
Corresponding author: P. Conesa; Email: pconesa@cnb.csic.es
Rights & Permissions [Opens in a new window]

Abstract

Image-processing pipelines require the design of complex workflows combining many different steps that bring the raw acquired data to a final result with biological meaning. In the image-processing domain of cryo-electron microscopy single-particle analysis (cryo-EM SPA), hundreds of steps must be performed to obtain the three-dimensional structure of a biological macromolecule by integrating data spread over thousands of micrographs containing millions of copies of allegedly the same macromolecule. The execution of such complicated workflows demands a specific tool to keep track of all these steps performed. Additionally, due to the extremely low signal-to-noise ratio (SNR), the estimation of any image parameter is heavily affected by noise resulting in a significant fraction of incorrect estimates. Although low SNR and processing millions of images by hundreds of sequential steps requiring substantial computational resources are specific to cryo-EM, these characteristics may be shared by other biological imaging domains. Here, we present Scipion, a Python generic open-source workflow engine specifically adapted for image processing. Its main characteristics are: (a) interoperability, (b) smart object model, (c) gluing operations, (d) comparison operations, (e) wide set of domain-specific operations, (f) execution in streaming, (g) smooth integration in high-performance computing environments, (h) execution with and without graphical capabilities, (i) flexible visualization, (j) user authentication and private access to private data, (k) scripting capabilities, (l) high performance, (m) traceability, (n) reproducibility, (o) self-reporting, (p) reusability, (q) extensibility, (r) software updates, and (s) non-restrictive software licensing.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. A set of 2D particles in the standard representation must be converted into the internal representation of software X, whose output is collected again by the workflow engine and converted again to its standard representation to serve as the input of further following processes.

Figure 1

Table 1. Table showing how the main functionality Scipion covers is fulfilled by, to the best of our knowledge, some of the workflow engines available to perform similar tasks.

Figure 2

Figure 2. Some images produced during SPA image processing. (a) An example of a section of a movie frame from EMPIAR-10579. (b) Micrograph with hundreds of apoferritin particles with 2D coordinates (green boxes) marked from same EMPIAR dataset. (c) 2D averages of the spikes of SARS-CoV-19. (d) Refined 3D map where the structure of an apoferritin protein can be appreciated.

Figure 3

Figure 3. Partial UML diagram of some of the classes defined for SPA image processing. On the left, the set’s hierarchy for images, particles, micrographs, and movies. On the right, single-item classes for the same concepts. Only attributes are shown for clarity.

Figure 4

Figure 4. Some of the visualization tools available in Scipion. (a) Context menu offered when right-clicking in a set of tomograms displaying three different viewers: Imod, Deepfinder and Dataviewer (Xmipp). (b) Local resolution results produced by Xmipp monores(42) and shown in ChimeraX(40). (c) 3D coordinates picked on a tomogram generated by pySeg(43) and shown in Tomoviz plugin. (d) 3D coordinates projected on the corresponding tilt series shown as fiducials in Imod’s fiducial viewer as a way to visually verify all metadata and data looks correct after a “relion4 prepare” process to enter subtomogram averaging.

Figure 5

Figure 5. Simplified visualization of some possible first steps of a tomography workflow showing a use case of four different software on the left. On the right, an example of the “aretomo—til-series align and reconstruct” protocol detailing its parameters and its link to the “tomo—import tilt-series” output called “outputTiltSeries.”

Figure 6

Figure 6. Details of the dynamic templates. (a) List of templates found in a Scipion installation. Those starting with “local-” are user templates found in a dedicated folder for templates. The rest are provided by some of the plugins like relion or tomo plugin. (b) A dynamic window is shown after selecting the “Tomocourse-Dec22-D1-Reconstruction” template. It offers to choose the name of the project to be created and to cancel scheduling or avoid showing the project once created. For this particular case, it additionally shows one dynamic field to be asked for its value: the “EMPIAR-10164 mdocs folder” field. (c) Excerpt of the same template opened in a text editor showing the part where the dynamic field is defined (filesPath).