Hostname: page-component-6766d58669-6mz5d Total loading time: 0 Render date: 2026-05-19T11:14:28.606Z Has data issue: false hasContentIssue false

MetaFunc: taxonomic and functional analyses of high throughput sequencing for microbiomes

Published online by Cambridge University Press:  12 January 2023

Arielle Kae Sulit*
Affiliation:
Department of Surgery, University of Otago, Christchurch, New Zealand School of Natural Sciences, Massey University, Auckland, New Zealand
Tyler Kolisnik
Affiliation:
School of Natural Sciences, Massey University, Auckland, New Zealand
Frank Antony Frizelle
Affiliation:
Department of Surgery, University of Otago, Christchurch, New Zealand
Rachel Purcell
Affiliation:
Department of Surgery, University of Otago, Christchurch, New Zealand
Sebastian Schmeier
Affiliation:
School of Natural Sciences, Massey University, Auckland, New Zealand
*
*Corresponding author. Email: iel_sulit@yahoo.com

Abstract

The identification of functional processes taking place in microbiome communities augment traditional microbiome taxonomic studies, giving a more complete picture of interactions taking place within the community. While there are applications that perform functional annotation on metagenomes or metatranscriptomes, very few of these are able to link taxonomic identity to function or are limited by their input types or databases used. Here we present MetaFunc, a workflow which takes RNA sequences as input reads, and from these (1) identifies species present in the microbiome sample and (2) provides gene ontology annotations associated with the species identified. In addition, MetaFunc allows for host gene analysis, mapping the reads to a host genome, and separating these reads, prior to microbiome analyses. Differential abundance analysis for microbe taxonomies, and differential gene expression analysis and gene set enrichment analysis may then be carried out through the pipeline. A final correlation analysis between microbial species and host genes can also be performed. Finally, MetaFunc builds an R shiny application that allows users to view and interact with the microbiome results. In this paper, we showed how MetaFunc can be applied to metatranscriptomic datasets of colorectal cancer.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of The Nutrition Society
Figure 0

Figure 1 MetaFunc Workflow. The workflow uses FASTQ or FASTA as input and processes reads through the microbiome pipeline to give microbial abundance and function (a) and/or host gene analysis (b) which will first map reads to a host before sending unmapped reads to the microbiome pipeline. Applying host read analysis will give gene expression analysis results as well as host gene-microbial species correlation. Solid boxes indicate steps with an output while dotted boxes indicate intermediate steps in the pipeline. NR: NCBI Blast nr database.

Figure 1

Figure 2 MetaFunc Microbiome and Host Analyses of Dataset PRJNA413956. (a)Average percent abundance of selected bacterial species in CRC tissue compared to matchednon-tumour(normal) samples. From MetaFunc tabulated results, we plotted the percent abundances of selected bacteria in CRC and matched normal samples. Raw values were first log2 transformed, with prior addition of 1 as a pseudocount to account for 0 values. Individual points represent per sample transformed values in red (CRC) and blue (Normal). Per group means are represented by the horizontal lines. Dotted lines connect matched CRC and normal samples. (b)Percent abundance of specific polyamine biosynthetic process GO terms among all biological process GOs in a sample/group compared between CRC (red) and normal (blue) samples. Values were calculated as described in section GO: protein annotation” and output in MetaFunc tables or in the R Shiny application. These values were plotted, overlaying group means (horizontal lines) and individual values (data points). (c)Screenshot from MetaFunc R shiny application.This view shows the first 10 species with proteins contributing to the GO polyamine biosynthetic process. The R Shiny application columns include a URL (not shown in screenshot), which is linked to the NCBI’s Taxonomy Browser, the Species Taxonomy ID, Lineage (indicated as “” in screenshot), Root Taxon, and percent abundances of the species in the two groups being compared: CRC and normal samples. Note that percent abundances refer to the total abundance of the species in question, not just the proteins contributing to the GO term. Results shown are sorted from highest to lowest percent abundance in the colon cancer cohort. (d)Fold change of representative upregulated and downregulated human genes (Li et al., 2018) between CRC and matched normal samples in this study. Fold change values were obtained from the edgeR results of the pipeline. All these genes are significant (FDR < 0.05) in both this study and the source publication.

Figure 2

Table 1. Spearman correlation between DA microbes and DGEs in CRC.

Figure 3

Figure 3 MetaFunc Microbiome Analysis of Dataset PRNJA4040030. (a)Microbes that are significantly more abundant (FDR < 0.05) in CMS1 compared to CMS2 (purple) and CMS3 (yellow). Microbes are considered DA in CMS1 if it is identified through edgeR as DA in both CMS1 versus CMS2 and CMS1 versus CMS3 comparisons. Log2FC (y-axis) is the log2 of the fold-change between CMS1 and the other subtypes (eg. CMS1/CMS2); FDR (point sizes) is the false discovery rate adjusted p-values. (b)Percent abundance of specific PAMPs biosynthetic process GO terms among all biological process GOs in a sample/group compared between CRC subtypes, CMS1 (red), CMS2 (purple), and CMS3 (yellow). Values were calculated as described in section “GO: protein annotation” and output in MetaFunc tables or in the R Shiny application. These values were plotted, overlaying group means (horizontal lines) and individual values (data points). (c)Screenshot of R shiny application showing the relative abundances of species associated with PAMPs biosynthetic processes compared among CMS1, CMS2, and CMS3. This view shows the first 10 species, with the highest abundances in CMS1, with proteins contributing to any of the PAMPs biosynthetic processes described above. The application columns show a URL (not shown in screenshot), which is linked to the NCBI’s Taxonomy Browser, the Species Taxonomy ID, Lineage (shown as “…” in screenshot), Root Taxon, and percent abundances of the species in the three groups being compared: CMS1, CMS2, and CMS3. Note that percent abundances refer to the total abundance of the species in question, not just the proteins contributing to the GO term. Results shown are sorted from highest to lowest percent abundance in the CMS1 group.

Figure 4

Table 2. Spearman correlation between DA microbes in CMS1 and DGEs in CMS1.

Figure 5

Table 3. Gene Information of DEGs correlated with DA Microbes in CMS1.

Supplementary material: File

Sulit et al. supplementary material

Figures S1-S3 and Tables S1-S7

Download Sulit et al. supplementary material(File)
File 1.2 MB