Log in to save to my catalogue

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in...

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in...

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_0c58db75fd8a487eb02d1ef019ea16a4

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank

About this item

Full title

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank

Publisher

England: BioMed Central Ltd

Journal title

Genome Biology, 2020-05, Vol.21 (1), p.115-115, Article 115

Language

English

Formats

Publication information

Publisher

England: BioMed Central Ltd

More information

Scope and Contents

Contents

Genomic analyses are sensitive to contamination in public databases caused by incorrectly labeled reference sequences. Here, we describe Conterminator, an efficient method to detect and remove incorrectly labeled sequences by an exhaustive all-against-all sequence comparison. Our analysis reports contamination of 2,161,746, 114,035, and 14,148 sequences in the RefSeq, GenBank, and NR databases, respectively, spanning the whole range from draft to “complete” model organism genomes. Our method scales linearly with input size and can process 3.3 TB in 12 days on a 32-core computer. Conterminator can help ensure the quality of reference databases. Source code (GPLv3):
https://github.com/martin-steinegger/conterminator...

Alternative Titles

Full title

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_doaj_primary_oai_doaj_org_article_0c58db75fd8a487eb02d1ef019ea16a4

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_0c58db75fd8a487eb02d1ef019ea16a4

Other Identifiers

ISSN

1474-760X,1474-7596

E-ISSN

1474-760X

DOI

10.1186/s13059-020-02023-1

How to access this item