HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_60c7bbe0bebc4a56a8a8e62ac3b9e2c6

HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

About this item

Full title

Author / Creator

Hiseni, Pranvera , Rudi, Knut , Wilson, Robert C. , Hegge, Finn Terje and Snipen, Lars

Publisher

London: BioMed Central

Journal title

Microbiome, 2021-07, Vol.9 (1), p.1-12, Article 165

Language

English

Formats

Articles

Publication information

Publisher

London: BioMed Central

Subjects

Subjects and topics

More information

Scope and Contents

Contents

Background A major bottleneck in the use of metagenome sequencing for human gut microbiome studies has been the lack of a comprehensive genome collection to be used as a reference database. Several recent efforts have been made to re-construct genomes from human gut metagenome data, resulting in a huge increase in the number of relevant genomes. In this work, we aimed to create a collection of the most prevalent healthy human gut prokaryotic genomes, to be used as a reference database, including both MAGs from the human gut and ordinary RefSeq genomes. Results We screened > 5,700 healthy human gut metagenomes for the containment of > 490,000 publicly available prokaryotic genomes sourced from RefSeq and the recently announced UHGG collection. This resulted in a pool of > 381,000 genomes that were subsequently scored and ranked based on their prevalence in the healthy human metagenomes. The genomes were then clustered at a 97.5% sequence identity resolution, and cluster representatives (30,691 in total) were retained to comprise the HumGut collection. Using the Kraken2 software for classification, we find superior performance in the assignment of metagenomic reads, classifying on average 94.5% of the reads in a metagenome, as opposed to 86% with UHGG and 44% when using standard Kraken2 database. A coarser HumGut collection, consisting of genomes dereplicated at 95% sequence identity—similar to UHGG, classified 88.25% of the reads. HumGut, half the size of standard Kraken2 database and directly comparable to the UHGG size, outperforms them both. Conclusions The HumGut collection contains > 30,000 genomes clustered at a 97.5% sequence identity resolution and ranked by human gut prevalence. We demonstrate how metagenomes from IBD-patients map equally well to this collection, indicating this reference is relevant also for studies well outside the metagenome reference set used to obtain HumGut. All data and metadata, as well as helpful code, are available at http://arken.nmbu.no/~larssn/humgut/. Video Abstract...

Alternative Titles

Full title

HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

Authors, Artists and Contributors

Author / Creator

Hiseni, Pranvera
Rudi, Knut
Wilson, Robert C.
Hegge, Finn Terje
Snipen, Lars

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_doaj_primary_oai_doaj_org_article_60c7bbe0bebc4a56a8a8e62ac3b9e2c6

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_60c7bbe0bebc4a56a8a8e62ac3b9e2c6

Other Identifiers

ISSN

2049-2618

E-ISSN

2049-2618

DOI

10.1186/s40168-021-01114-w

How to access this item

Full text available

View in old catalogue

HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

About this item

Publication information

Subjects

More information

Scope and Contents

Alternative Titles

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Other Identifiers

How to access this item

Connecting people and collections

Indigenous engagement

Learning

Stories