Greengenes2 unifies microbial data in a single reference tree

McDonald, D., Jiang, Y., Balaban, M. et al. Greengenes2 unifies microbial data in a single reference tree. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01845-1

This is a summary of our August 10th, 2023 DalMUG journal club discussion, written by Monica Alvaro Fuss

Summary

Greengenes2 aims to address the lack of reproducibility between 16S and shotgun metagenomic studies by unifying 16S and genomic databases into a single, cohesive reference tree. Until now both types of phylogenies existed separately, with 16S reference trees such as SILVA and the original Greengenes being generally more comprehensive than existing whole-genome trees. The new Greengenes2 reference tree consists of 21,074,422 bacterial and archaeal sequences from various whole-genome and both full-length and short-fragment 16S rRNA gene sources from 31 different EMP Ontology 3 environments, consolidated using the new workflow uDance and deep-learning-enabled phylogenetic placement (DEPP), making it much larger than previous trees. Taxonomy from both GTDB and the Living Tree Project (LTP) was harmonized by prioritizing that of GTDB and will be updated every six months according to the authors. The authors further state that using Greengenes2 provides concordance between 16S and whole-genome data for UniFrac-based ordination (but not Bray Curtis-based ordination), species-level relative abundance profiles, as well as rankings for variables associated with changes to the human microbiome.

Below are the key points that came up during our discussion.

Points of discussion

  • Sources of bias that may account for resulting differences between 16S and metagenomic studies are mentioned in the abstract but not discussed further.
  • We compared features of existing 16S and whole-genome trees, highlighting differences in taxonomy.
  • We briefly discussed the phylogenetic methods that make this a novel approach, aside from the sheer size of the project.
  • Does the separation between 16S and whole-genome samples (Bray Curtis vs. UniFrac clustering) also occur when using other reference trees?
  • Synthetic communities could have been used for a more accurate comparison between 16S and metagenomic profiles.
  • This is the first such reconciling approach between amplicon and metagenomic sequencing pipelines and does appear to provide an attractive alternative for current reference trees.
Written on August 10, 2023