HiFiBGC: an ensemble approach for improved biosynthetic gene cluster detection in PacBio HiFi-read m...
HiFiBGC: an ensemble approach for improved biosynthetic gene cluster detection in PacBio HiFi-read metagenomes
About this item
Full title
Author / Creator
Publisher
England: BioMed Central Ltd
Journal title
Language
English
Formats
Publication information
Publisher
England: BioMed Central Ltd
Subjects
More information
Scope and Contents
Contents
Microbes produce diverse bioactive natural products with applications in fields such as medicine and agriculture. In their genomes, these natural products are encoded by physically clustered genes known as biosynthetic gene clusters (BGCs). Genome and metagenome sequencing advances have enabled high-throughput identification of BGCs as a promising avenue for natural product discovery. BGC mining from (meta)genomes using in silico tools has allowed access to a vast diversity of potentially novel natural products. However, a fundamental limitation has been the ability to assemble complete BGCs, especially from complex metagenomes. With their fragmented assemblies, short-read technologies struggle to recover complete BGCs, such as the long and repetitive nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS). Recent advances in long-read sequencing, such as the High Fidelity (HiFi) technology from PacBio, have reduced this limitation and can help retrieve both accurate and complete BGCs from metagenomes, warranting improvement in the existing BGC identification approach for better utilization of HiFi data.
Here, we present HiFiBGC, a command-line-based workflow to identify BGCs in PacBio HiFi metagenomes. HiFiBGC leverages an ensemble of assemblies from three HiFi-tailored metagenome assemblers and the reads not represented in these assemblies. Based on our analyses of four HiFi metagenomic datasets from four different environments, we show that HiFiBGC identifies, on average, 78% more BGCs than the top-performing single-assembler-based method. This increase is due to HiFiBGC's ensemble assembly approach, which improves recovery by 25%, as well as from the inclusion of mostly fragmented BGCs identified in the unmapped reads.
HiFiBGC is a computational workflow for identifying BGCs in long-read HiFi metagenomes, implemented majorly using Python programming language and workflow manager Snakemake. HiFiBGC is available on GitHub at https://github.com/ay-amityadav/HiFiBGC under the MIT license. The code related to the figures and analyses presented in the manuscript is available at https://github.com/ay-amityadav/HiFiBGC_analyses ....
Alternative Titles
Full title
HiFiBGC: an ensemble approach for improved biosynthetic gene cluster detection in PacBio HiFi-read metagenomes
Authors, Artists and Contributors
Author / Creator
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_doaj_primary_oai_doaj_org_article_15600ea1715842b783bab5e72771b9d1
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_15600ea1715842b783bab5e72771b9d1
Other Identifiers
ISSN
1471-2164
E-ISSN
1471-2164
DOI
10.1186/s12864-024-10950-7