Log in to save to my catalogue

Two-Stage Clustering (TSC): A Pipeline for Selecting Operational Taxonomic Units for the High-Throug...

Two-Stage Clustering (TSC): A Pipeline for Selecting Operational Taxonomic Units for the High-Throug...

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_plos_journals_1322492383

Two-Stage Clustering (TSC): A Pipeline for Selecting Operational Taxonomic Units for the High-Throughput Sequencing of PCR Amplicons

About this item

Full title

Two-Stage Clustering (TSC): A Pipeline for Selecting Operational Taxonomic Units for the High-Throughput Sequencing of PCR Amplicons

Publisher

United States: Public Library of Science

Journal title

PloS one, 2012-01, Vol.7 (1), p.e30230

Language

English

Formats

Publication information

Publisher

United States: Public Library of Science

More information

Scope and Contents

Contents

Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs) is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC) because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/....

Alternative Titles

Full title

Two-Stage Clustering (TSC): A Pipeline for Selecting Operational Taxonomic Units for the High-Throughput Sequencing of PCR Amplicons

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_plos_journals_1322492383

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_plos_journals_1322492383

Other Identifiers

ISSN

1932-6203

E-ISSN

1932-6203

DOI

10.1371/journal.pone.0030230

How to access this item