Log in to save to my catalogue

Improving contig binning of metagenomic data using d2S\documentclass[12pt]{minimal} \usepackage{amsm...

Improving contig binning of metagenomic data using d2S\documentclass[12pt]{minimal} \usepackage{amsm...

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_gale_infotracmisc_A507109471

Improving contig binning of metagenomic data using d2S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S $$\end{document} oligonucleotide frequency dissimilarity

About this item

Full title

Improving contig binning of metagenomic data using d2S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S $$\end{document} oligonucleotide frequency dissimilarity

Publisher

BioMed Central Ltd

Journal title

BMC bioinformatics, 2017-09, Vol.18 (1)

Language

English

Formats

Publication information

Publisher

BioMed Central Ltd

More information

Scope and Contents

Contents

Metagenomics sequencing provides deep insights into microbial communities. To investigate their taxonomic structure, binning assembled contigs into discrete clusters is critical. Many binning algorithms have been developed, but their performance is not always satisfactory, especially for complex microbial communities, calling for further development. According to previous studies, relative sequence compositions are similar across different regions of the same genome, but they differ between distinct genomes. Generally, current tools have used the normalized frequency of k-tuples directly, but this represents an absolute, not relative, sequence composition. Therefore, we attempted to model contigs using relative k-tuple composition, followed by measuring dissimilarity between contigs using d2S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S $$\end{document}. The d2S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S $$\end{document} was designed to measure the dissimilarity between two long sequences or Next-Generation Sequencing data with the Markov models of the background genomes. This method was effective in revealing group and gradient relationships between genomes, metagenomes and metatranscriptomes. With many binning tools available, we do not try to bin contigs from scratch. Instead, we developed d2SBin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S\mathrm{Bin} $$\end{document} to adjust contigs among bins based on the output of existing binning tools for a single metagenomic sample. The tool is taxonomy-free and depends only on k-tuples. To evaluate the performance of d2SBin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S\mathrm{Bin} $$\end{document}, five widely used binning tools with different strategies of sequence composition or the hybrid of sequence composition and abundance were selected to bin six synthetic and real datasets, after which d2SBin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S\mathrm{Bin} $$\end{document} was applied to adjust the binning results. Our experiments showed that d2SBin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S\mathrm{Bin} $$\end{document} consistently achieves the best performance with tuple length k = 6 under the independent identically distributed (i.i.d.) background model. Using the metrics of recall, precision and ARI (Adjusted Rand Index), d2SBin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S\mathrm{Bin} $$\end{document} improves the binning performance in 28 out of 30 testing experiments (6 datasets with 5 binning tools). The d2SBin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S\mathrm{Bin} $$\end{document} is available at https://github.com/kunWangkun/d2SBin. Experiments showed that d2S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S $$\end{document} accurately measures the dissimilarity between contigs of meta...

Alternative Titles

Full title

Improving contig binning of metagenomic data using d2S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {d}_2^S $$\end{document} oligonucleotide frequency dissimilarity

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_gale_infotracmisc_A507109471

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_gale_infotracmisc_A507109471

Other Identifiers

ISSN

1471-2105

E-ISSN

1471-2105

DOI

10.1186/s12859-017-1835-1

How to access this item