Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-08T03:00:37.529Z Has data issue: false hasContentIssue false

Protein structures unravel the signatures and patterns of deep time evolution

Published online by Cambridge University Press:  29 January 2024

Ajith Harish*
Affiliation:
Independent Researcher, Uppsala, Sweden
*
Corresponding author: Ajith Harish; Email: ajith.harish@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

The formulation and testing of hypotheses using ‘big biology data’ often lie at the interface of computational biology and structural biology. The Protein Data Bank (PDB), which was established about 50 years ago, catalogs three-dimensional (3D) shapes of organic macromolecules and showcases a structural view of biology. The comparative analysis of the structures of homologs, particularly of proteins, from different species has significantly improved the in-depth analyses of molecular and cell biological questions. In addition, computational tools that were developed to analyze the ‘protein universe’ are providing the means for efficient resolution of longstanding debates in cell and molecular evolution. In celebrating the golden jubilee of the PDB, much has been written about the transformative impact of PDB on a broad range of fields of scientific inquiry and how structural biology transformed the study of the fundamental processes of life. Yet, the transforming influence of PDB on one field of inquiry of fundamental interest—the reconstruction of the distant biological past—has gone almost unnoticed. Here, I discuss the recent advances to highlight how insights and tools of structural biology are bearing on the data required for the empirical resolution of vigorously debated and apparently contradicting hypotheses in evolutionary biology. Specifically, I show that evolutionary characters defined by protein structure are superior compared to conventional sequence characters for reliable, data-driven resolution of competing hypotheses about the origins of the major clades of life and evolutionary relationship among those clades. Since the better quality data unequivocally support two primary domains of life, it is imperative that the primary classification of life be revised accordingly.

Information

Type
Perspective
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. The molecular componentry of the cellular machinery. Most proteins fold into specific 3D shapes and form diverse supramolecular complexes to perform their biological functions (a). In addition to carrying out the biochemical reactions, proteins build and maintain the morphological features of cells. Cells are membranous ensembles studded with proteins, inside and out. The two basic cell types—eukaryotic (nucleated) and akaryotic (anucleate) cells—and the extent of membrane-bound compartmentation are shown as section of the ultrastructure (b) and the overall structure of an average eukaryotic cell and an average akaryotic cell (c).

Figure 1

Figure 2. Protein domains are unique ‘molecular phenotypes’ to map patterns of species diversification. Structural domains are distinct homologous units that form complex protein morphologies (a; left) comparable to complex morphological structures (a; right). Since domains usually have characteristic functions, they are ‘functional genomic signatures’ (Harish, 2018). Measures of compositional variation of domains is a useful metric of divergence among organismal species groups (b; left). The number of unique occurrences defines a measure of ‘intrinsic proteomic complexity’ (Harish and Kurland, 2017b). Principal components analysis (PCA) projections show that the covariations of domain composition correspond to the two basic cell types (c; left). PC1 separates groups of eukaryote species from those of akaryote species, while PC2 separates species groups within each of eukaryotes and akaryotes. Domain-based metrics are comparable to measures of variance in fossil jaw and dental homologs (b; right), and the covariations of these features correspond to clades of dinosaur species (c; right panel taken from Nordén et al.,2018).

Figure 2

Table 1. Results of model selection tests for sequence-based and structure-based characters

Figure 3

Figure 3. Evolutionary telescopes: Protein structure telescopes can look further back in time, however sequence telescopes cannot. Optical telescopes are used to look at the distant ‘galactic fossils’ of the cosmological universe estimated to have originated ≈14 BYa (a). Phylogenies are the ‘evolutionary telescopes’ used to look back into the distant biological past depicted as the ‘universal tree’ of life. The universal tree shown in (b) is a schematic of the phylogeny inferred from patterns of inheritance of ‘functional genomic signatures’ defined by unique protein domains (Harish, 2018). Protein structure telescopes can look further back in time due to their superior resolving power (b) compared to the commonly used sequence telescopes (c). Ancestral nodes including the root node of the universal tree (UCA) as well as the root node of the archaea tree (ACA) cannot be identified using sequence telescopes (e.g., Williams et al.,2020); hence, the reconstructed picture of evolution is poorly resolved and incomplete (c).

Figure 4

Box–Figure 1. Examples of different rearrangements of the branching order following an outgroup rooting. Depending on the different positions of the root node, the different degrees of relatedness among species groups can be inferred. The nearest neighbor in an unrooted tree may not be the closest evolutionary relative (Harish, 2018). The degree of relatedness can only be determined with rooted trees. Trees are drawn as cladograms with emphasis on branching order and relative age of common ancestors of contemporary species; branch lengths have no evolutionary implications.

Figure 5

Figure 4. Assessment of empirical evidence for or against alternative universal trees and for identifying the primary clades of life. The identification of the primary clades in the universal tree is basically linked to the identity of the root node (UCA), which is implicitly assumed ever since the earliest universal trees were put forward (ae). The most popular assumption (Woese et al.,1990) is that UCA is positioned on the stem branch leading to (Eu)bacteria (c) and the textbook universal tree (f). Several other phylogenetic positions for the UCA and the resulting phylogenies (d, gk). Assessment of empirical evidence for or against these proposals (l) shows that the universal tree in which Eukarya and Akarya are the primary clades (h) is most likely to be correct (Harish, 2018).

Supplementary material: File

Harish supplementary material

Harish supplementary material
Download Harish supplementary material(File)
File 385.5 KB

Author comment: Protein structures unravel the signatures and patterns of deep time evolution — R0/PR1

Comments

I request Professor Bengt Nordén to be the handling editor for my article.

Review: Protein structures unravel the signatures and patterns of deep time evolution — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

The review manuscript by Harish is a valuable update of a series of papers concerning the evolutionary relationship between species of different groupings. Classically sequence comparisons have primarily been used, but Harish with coworkers have focused on the structural identity of domains. Clearly structure is a more conserved parameter than sequences. Significant efforts have been made to find the possible evolutionary root in evolution of species and a well-supported case is presented. In an overview like this one it would be of interest to include a brief mentioning of the possibility of pan-spermia, maybe life on earth did begin elsewhere and that Tellus could have been “infected” from planets elsewhere once or maybe several times. In such case it could be an overstatement to call some species on earth universal common ancestor.

In the details it is mentioned that the ribosomal proteins form a specially interesting group. I would suggest that looking at these proteins one finds that they speak against the placement of archaea and bacteria close together and eukaryotes further apart. When looking at the identity of ribosomal proteins many are found in all three groups, Bacteria have a range of unique proteins, but all archaeal proteins have eukaryal correspondence (Appendix 1, Liljas & Ehrenberg 2013. Structural Aspects of Protein Synthesis. 2nd edition. World Scientific). Maybe it would be interesting to go a few steps further from comparison of structural domain to comparison of organelle composition.

The review is well written and concerns a central topic that needs to be discussed. I recommend the publication of the review after consideration of my comments.

Review: Protein structures unravel the signatures and patterns of deep time evolution — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

This is a mini-review on the evolution of protein structures. It is generally difficult to review a review paper, but for this paper it is not clear what message the author wants to convey and the paper does not provide much novelty to the field. I can therefore not recommend this paper for further consideration.

Review: Protein structures unravel the signatures and patterns of deep time evolution — R0/PR4

Conflict of interest statement

Reviewer declares no conflict of interest

Comments

This manuscript proposes to discuss the value of the Protein Data Bank to the field of phylogenetics. The first pages provide a nice introduction in this direction; however, beginning with the second page, the manuscript goes off on tangents that distract from the goal. For example, a biologically-informed reader would be expected to know what proteins and cells are, as well as eukaryotes and a/prokaryotes. Similarly, the section Farsightedness and Nearsightedness is overly verbose in its analogy. It could effectively make the point without distracting the reader with a tangent to this extent on astronomy. The manuscript then goes on to discuss phylogenetic rooting for six pages. Given that the goal of the paper is to discuss the use of protein structure, the criticism of rooting methods goes rather far off topic and needs better focus on how protein structure in beneficial. In sum, the manuscript is missing much of the detailed explanation (with evidence!) of how the PDB and protein structures have been used and may be used for phylogenetics that was promised in the abstract.

This manuscript should be adapted to the requested format for a Perspective.

“ generally articles should not exceed 4,000 words”

“ Perspectives should end with a short outlook section, of no more than a paragraph or two.”

Box 1 - Fig 1 is confusing. I believe the point is to explain that different roots give different estimated relationships. However, (1) in both rooted cases for this example the ingroup relationships are the same, and (2) the right rooted case is so odd as to be distracting - why would one create a “marine” outgroup including both invertebrates and vertebrates sharing a common ancestor? Additionally, this box describes rooting with an outgroup as a pseudo-root. While this wouldn’t be possible for the full Tree of Life (the discussion starting line 380 is reasonable) it seems odd to have such a harsh critique for a well-accepted and non-controversial practice for subsets of Life.

The critique of the SARS-CoV-2 phylogeny seems off topic. Just because one paper used MJN and possibly made incorrect inferences, doesn’t mean that rooting with sequence data isn’t possible, especially in recent evolution.

The discussion in line 411+ is not consistent with any modern evolutionary biology. No modern work is cited. Phylogeneticists in fact make major efforts to insist that extant forms are not primitive or ancestral.

The advantages of protein structure are not brought up until the second-to-last page. Here there are only four references. While the bullets claim to summarize these references, further evidence is required for many general statements.

Figure 1: This figure should be removed. Because this material is background there is no need to provide an illustration of this basic topic. Further it’s unclear whether the images are legally usable in this context. See https://www-nature-com.uri.idm.oclc.org/articles/s41594-021-00587-5, which specifies “ image is available under Creative Commons”; however, multiple CC licenses are available with different specifications, some of which do not allow adaptation. Also see https://images.cad.rit.edu/exhibit_36.html which is copyrighted. No images should be included that are copyrighted and not under a license allowing for appropriate reuse.

Figure 2: See comment from Figure 1. This figure should be removed as unnecessary. Not all image credits are provided.

There are some minor phrasing / grammatical issues that should be checked.

Decision: Protein structures unravel the signatures and patterns of deep time evolution — R0/PR5

Comments

I also want him to comment, and if possible explain, why ribosome proteins in archaea are fully congruent with ', but not with bacteria (Editors question).

Author comment: Protein structures unravel the signatures and patterns of deep time evolution — R1/PR6

Comments

Dear Prof. Norden,

I am sorry for the slow response, my computer hard disk failed suddenly. Diagnosing the problem and restoring the computer/files took some time.

I thank you and the reviewers – Reviewer-1 and Reviewer-3 – for the comments and suggestions. The comments and suggestions helped improve the presentation. Apart from revising the manuscript, I have also addressed the reviewers’ comments in detail. My responses are inline to reviewers’ comments in blue text. In the revised manuscript, I have reorganized some sections and included a new section to address the comments. A file with changes highlighted is also provided in which texts that were moved around, but were kept as-is are colored green, all other changes are colored red.

Unfortunately, Reviewer-2 has thoroughly misunderstood the paper. My article is, quite evidently, Not “a mini-review on the evolution of protein structures”, but a critical analysis of the advantages and disadvantages of employing proteins structures as evolutionary features to reconstruct the evolution of cellular life, in place of sequences. A rather glaring misreading of the article by Reviewer-2. Therefore, the two-sentence-review written by Reviewer-2 is not helpful at all. Such a poor review, in my view, does not contribute anything useful to the peer review process. Hence, I will have to ignore Reviewer-2’s comments.

I also hope that you find the revisions to be satisfactory and that the article can proceed toward publication.

Best regards,

Ajith

Decision: Protein structures unravel the signatures and patterns of deep time evolution — R1/PR7

Comments

No accompanying comment.