Skip to content
1887

Abstract

Microbiome profiling tools rely on reference catalogues, which significantly affect their performance. Comparing them is, however, challenging, mainly due to differences in their native catalogues. In this study, we present a novel standardized benchmarking framework that makes such comparisons more accurate. We decided not to customize databases but to translate results to a common reference to use the tools with their native environment. Specifically, we conducted two realistic simulations of gut microbiome samples, each based on a specific taxonomic profiler, and used two different taxonomic references to project their results, namely the Genome Taxonomy Database and the Unified Human Gastrointestinal Genome. To demonstrate the importance of using such a framework, we evaluated four established profilers as well as the impact of the simulations and that of the common taxonomic references on the perceived performance of these profilers. Finally, we provide guidelines to enhance future profiler comparisons for human microbiome ecosystems: (i) use or create realistic simulations tailored to your biological context (BC), (ii) identify a common feature space suited to your BC and independent of the catalogues used by the profilers and (iii) apply a comprehensive set of metrics covering accuracy (sensitivity/precision), overall representativity (richness/Shannon) and quantification (UniFrac and/or Aitchison distance).

Funding
This study was supported by the:
  • Conseil Régional de Haute Normandie (Award DOS0171566/00)
    • Principle Award Recipient: NotApplicable
  • Bpifrance (Award DOS0171565/00)
    • Principle Award Recipient: NotApplicable
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001330
2025-01-13
2025-01-14
Loading full text...

Full text loading...

/deliver/fulltext/mgen/11/1/mgen001330.html?itemId=/content/journal/mgen/10.1099/mgen.0.001330&mimeType=html&fmt=ahah

References

  1. Thomas LV, Ockhuizen T. New insights into the impact of the intestinal microbiota on health and disease: a symposium report. Br J Nutr 2012; 107:S1–S13 [View Article] [PubMed]
    [Google Scholar]
  2. Ley RE, Turnbaugh PJ, Klein S, Gordon JI. Microbial ecology: human gut microbes associated with obesity. Nature 2006; 444:1022–1023 [View Article] [PubMed]
    [Google Scholar]
  3. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 2006; 444:1027–1031 [View Article] [PubMed]
    [Google Scholar]
  4. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A et al. A core gut microbiome in obese and lean twins. Nature 2009; 457:480–484 [View Article] [PubMed]
    [Google Scholar]
  5. Ley RE, Bäckhed F, Turnbaugh P, Lozupone CA, Knight RD et al. Obesity alters gut microbial ecology. Proc Natl Acad Sci U S A 2005; 102:11070–11075 [View Article] [PubMed]
    [Google Scholar]
  6. Lepage P, Häsler R, Spehlmann ME, Rehman A, Zvirbliene A et al. Twin study indicates loss of interaction between microbiota and mucosa of patients with ulcerative colitis. Gastroenterology 2011; 141:227–236 [View Article] [PubMed]
    [Google Scholar]
  7. Qin J, Li Y, Cai Z, Li S, Zhu J et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 2012; 490:55–60 [View Article] [PubMed]
    [Google Scholar]
  8. Vijay-Kumar M, Aitken JD, Carvalho FA, Cullender TC, Mwangi S et al. Metabolic syndrome and altered gut microbiota in mice lacking Toll-like receptor 5. Science 2010; 328:228–231 [View Article] [PubMed]
    [Google Scholar]
  9. Yan AW, Fouts DE, Brandl J, Stärkel P, Torralba M et al. Enteric dysbiosis associated with a mouse model of alcoholic liver disease. Hepatology 2011; 53:96–105 [View Article]
    [Google Scholar]
  10. Qin N, Yang F, Li A, Prifti E, Chen Y et al. Alterations of the human gut microbiome in liver cirrhosis. Nature 2014; 513:59–64 [View Article] [PubMed]
    [Google Scholar]
  11. Feng Q, Liang S, Jia H, Stadlmayr A, Tang L et al. Gut microbiome development along the colorectal adenoma-carcinoma sequence. Nat Commun 2015; 6:6528 [View Article] [PubMed]
    [Google Scholar]
  12. Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR et al. Potential of fecal microbiota for early‐stage detection of colorectal cancer. Mol Syst Biol 2014; 10: [View Article]
    [Google Scholar]
  13. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010; 464:59–65 [View Article] [PubMed]
    [Google Scholar]
  14. Li J, Jia H, Cai X, Zhong H, Feng Q et al. An integrated catalog of reference genes in the human gut microbiome. Nat Biotechnol 2014; 32:834–841 [View Article] [PubMed]
    [Google Scholar]
  15. Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 2021; 39:105–114 [View Article] [PubMed]
    [Google Scholar]
  16. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014; 15:1–12 [View Article] [PubMed]
    [Google Scholar]
  17. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20:257 [View Article] [PubMed]
    [Google Scholar]
  18. Lu J, Rincon N, Wood DE, Breitwieser FP, Pockrandt C et al. Metagenome analysis using the Kraken software suite. Nat Protoc 2022; 17:2815–2839 [View Article] [PubMed]
    [Google Scholar]
  19. Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 2016; 26:1721–1729 [View Article] [PubMed]
    [Google Scholar]
  20. Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 2016; 7:11257 [View Article] [PubMed]
    [Google Scholar]
  21. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods 2015; 12:59–60 [View Article] [PubMed]
    [Google Scholar]
  22. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44:D733–D745 [View Article]
    [Google Scholar]
  23. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 2016; 44:6614–6624 [View Article] [PubMed]
    [Google Scholar]
  24. Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil PA et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res 2022; 50:D785–D794 [View Article] [PubMed]
    [Google Scholar]
  25. Rinke C, Chuvochina M, Mussig AJ, Chaumeil P-A, Davín AA et al. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol 2021; 6:946–959 [View Article] [PubMed]
    [Google Scholar]
  26. Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ et al. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol 2020; 38:1079–1086 [View Article]
    [Google Scholar]
  27. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 2018; 36:996–1004 [View Article]
    [Google Scholar]
  28. Richardson L, Allen B, Baldi G, Beracochea M, Bileschi ML et al. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Res 2023; 51:D753–D759 [View Article]
    [Google Scholar]
  29. Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 2012; 9:811–814 [View Article] [PubMed]
    [Google Scholar]
  30. Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 2015; 12:902–903 [View Article]
    [Google Scholar]
  31. Beghini F, McIver LJ, Blanco-Míguez A, Dubois L, Asnicar F et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. Elife 2021; 10:e65088 [View Article] [PubMed]
    [Google Scholar]
  32. Blanco-Miguez A, Beghini F, Cumbo F, McIver LJ, Thompson KN et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn 4. Bioinformatics 2022 [View Article]
    [Google Scholar]
  33. Truong DT, Tett A, Pasolli E, Huttenhower C, Segata N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res 2017; 27:626–638 [View Article] [PubMed]
    [Google Scholar]
  34. Milanese A, Mende DR, Paoli L, Salazar G, Ruscheweyh H-J et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat Commun 2019; 10:1014 [View Article] [PubMed]
    [Google Scholar]
  35. Ruscheweyh H-J, Milanese A, Paoli L, Sintsova A, Mende DR et al. mOTUs: profiling taxonomic composition, transcriptional activity and strain populations of microbial communities. Curr Protoc 2021; 1:e218 [View Article] [PubMed]
    [Google Scholar]
  36. Parks DH, Rigato F, Vera-Wolf P, Krause L, Hugenholtz P et al. Evaluation of the microba community profiler for taxonomic profiling of metagenomic datasets from the human gut microbiome. Front Microbiol 2021; 12:643682 [View Article] [PubMed]
    [Google Scholar]
  37. Ye SH, Siddle KJ, Park DJ, Sabeti PC. Benchmarking metagenomics tools for taxonomic classification. Cell 2019; 178:779–794 [View Article]
    [Google Scholar]
  38. Pons N, Batto J, Kennedy S, Almeida M, Boumezbeur F et al. METEOR, a platform for quantitative metagenomic profiling of complex ecosystems; 2010 http://www.jobim2010.fr/sites/default/files/presentations/27Pons.pdf
  39. Gauthier F, Pons N. Meteor (metagenomic explorator), a software for profiling metagenomic data at gene level; 2021 https://forgemia.inra.fr/metagenopolis/meteor
  40. Kultima JR, Sunagawa S, Li J, Chen W, Chen H et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One 2012; 7:e47656 [View Article]
    [Google Scholar]
  41. Kultima JR, Coelho LP, Forslund K, Huerta-Cepas J, Li SS et al. MOCAT2: a metagenomic assembly, annotation and profiling framework. Bioinformatics 2016; 32:2520–2523 [View Article]
    [Google Scholar]
  42. Coelho LP, Alves R, Monteiro P, Huerta-Cepas J, Freitas AT et al. NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. Microbiome 2019; 7:84 [View Article] [PubMed]
    [Google Scholar]
  43. Fritz A, Hofmann P, Majda S, Dahms E, Dröge J et al. CAMISIM: simulating metagenomes and microbial communities. Microbiome 2019; 7:17 [View Article] [PubMed]
    [Google Scholar]
  44. Wright RJ, Comeau AM, Langille MGI. From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools. Microb Genom 2023; 9: [View Article]
    [Google Scholar]
  45. Lindgreen S, Adair KL, Gardner PP. An evaluation of the accuracy and speed of metagenome analysis tools. Sci Rep 2016; 6:19233 [View Article] [PubMed]
    [Google Scholar]
  46. Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S et al. Critical assessment of metagenome interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14:1063–1071 [View Article] [PubMed]
    [Google Scholar]
  47. Meyer F, Fritz A, Deng Z-L, Koslicki D, Lesker TR et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat Methods 2022; 19:429–440 [View Article] [PubMed]
    [Google Scholar]
  48. McHardy AC, Meyer F. CAMI II: identifying best practices and issues for metagenomics software. In Nature Methods vol 19 Berlin, 14197, Germany: Nature portfolio Heidelberger Platz 3; 2022 pp 412–413
    [Google Scholar]
  49. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci 2017; 3:e104 [View Article]
    [Google Scholar]
  50. de la Cuesta-Zuluaga J, Ley RE, Youngblut ND. Struo: a pipeline for building custom databases for common metagenome profilers. Bioinformatics 2020; 36:2314–2315 [View Article] [PubMed]
    [Google Scholar]
  51. Youngblut ND, Ley RE. Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets. PeerJ 2021; 9:e12198 [View Article] [PubMed]
    [Google Scholar]
  52. Almeida A, Mitchell AL, Boland M, Forster SC, Gloor GB et al. A new genomic blueprint of the human gut microbiota. Nature 2019; 568:499–504 [View Article] [PubMed]
    [Google Scholar]
  53. Ounit R, Wanamaker S, Close TJ, Lonardi S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom 2015; 16:236 [View Article] [PubMed]
    [Google Scholar]
  54. Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 2005; 71:8228–8235 [View Article] [PubMed]
    [Google Scholar]
  55. Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities. Appl Environ Microbiol 2007; 73:1576–1585 [View Article] [PubMed]
    [Google Scholar]
  56. Quinn TP, Erb I, Richardson MF, Crowley TM. Understanding sequencing data as compositions: an outlook and review. Bioinformatics 2018; 34:2870–2878 [View Article] [PubMed]
    [Google Scholar]
  57. Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome datasets are compositional: and this is not optional. Front Microbiol 2017; 8:2224 [View Article] [PubMed]
    [Google Scholar]
  58. Calle ML. Statistical analysis of metagenomics data. Genomics Inform 2019; 17:e6 [View Article] [PubMed]
    [Google Scholar]
  59. Martín-Fernández JA, Barceló-Vidal C, Pawlowsky-Glahn V. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geol 2003; 35:253–278 [View Article]
    [Google Scholar]
  60. Tommaso PD, Floden EW, Magis C, Palumbo E, Notredame C. Nextflow: un outil efficace pour l’amélioration de la stabilité numérique des calculs en analyse génomique. Biologie Aujourd’hui 2017; 211:233–237 [View Article]
    [Google Scholar]
  61. Celery Celery - Distributed Task Queue; 2009 https://docs.celeryq.dev/en/stable/
  62. Le Chatelier E, Nielsen T, Qin J, Prifti E, Hildebrand F et al. Richness of human gut microbiome correlates with metabolic markers. Nature 2013; 500:541–546 [View Article]
    [Google Scholar]
  63. Portik DM, Brown CT, Pierce-Ward NT. Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets. BMC Bioinform 2022; 23:541 [View Article] [PubMed]
    [Google Scholar]
  64. Xu R, Rajeev S, Salvador LCM. The selection of software and database for metagenomics sequence analysis impacts the outcome of microbial profiling and pathogen detection. PLoS One 2023; 18:e0284031 [View Article]
    [Google Scholar]
  65. Seppey M, Manni M, Zdobnov EM. LEMMI: a continuous benchmarking platform for metagenomics classifiers. Genome Res 2020; 30:1208–1216 [View Article] [PubMed]
    [Google Scholar]
  66. McIntyre ABR, Ounit R, Afshinnekoo E, Prill RJ, Hénaff E et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol 2017; 18:182 [View Article] [PubMed]
    [Google Scholar]
  67. Amos GCA, Logan A, Anwar S, Fritzsche M, Mate R et al. Developing standards for the microbiome field. Microbiome 2020; 8:98 [View Article] [PubMed]
    [Google Scholar]
  68. Miossec MJ, Valenzuela SL, Pérez-Losada M, Johnson WE, Crandall KA et al. Evaluation of computational methods for human microbiome analysis using simulated data. PeerJ 2020; 8:e9688 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001330
Loading
/content/journal/mgen/10.1099/mgen.0.001330
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error