Log in to save to my catalogue

Sequence embedding for fast construction of guide trees for multiple sequence alignment

Sequence embedding for fast construction of guide trees for multiple sequence alignment

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_358f17028cfc4e16ac3ae0c8e53bbc33

Sequence embedding for fast construction of guide trees for multiple sequence alignment

About this item

Full title

Sequence embedding for fast construction of guide trees for multiple sequence alignment

Publisher

England: BioMed Central Ltd

Journal title

Algorithms for molecular biology, 2010-05, Vol.5 (1), p.21-21, Article 21

Language

English

Formats

Publication information

Publisher

England: BioMed Central Ltd

More information

Scope and Contents

Contents

The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments.
In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances.
We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http://www.clustal.org/mbed.tgz....

Alternative Titles

Full title

Sequence embedding for fast construction of guide trees for multiple sequence alignment

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_doaj_primary_oai_doaj_org_article_358f17028cfc4e16ac3ae0c8e53bbc33

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_358f17028cfc4e16ac3ae0c8e53bbc33

Other Identifiers

ISSN

1748-7188

E-ISSN

1748-7188

DOI

10.1186/1748-7188-5-21

How to access this item