Log in to save to my catalogue

Correcting bias from stochastic insert size in read pair data applications to structural variation d...

Correcting bias from stochastic insert size in read pair data applications to structural variation d...

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2070823308

Correcting bias from stochastic insert size in read pair data applications to structural variation detection and genome assembly

About this item

Full title

Correcting bias from stochastic insert size in read pair data applications to structural variation detection and genome assembly

Publisher

Cold Spring Harbor: Cold Spring Harbor Laboratory Press

Journal title

bioRxiv, 2015

Language

English

Publication information

Publisher

Cold Spring Harbor: Cold Spring Harbor Laboratory Press

More information

Scope and Contents

Contents

Insert size distributions from paired read protocols are used for inference in bioinformatic applications such as genome assembly and structural variation detection. However, many of the models that are being used are subject to bias. This bias arises when we assume that all insert sizes within a distribution are equally likely to be observed, when in fact, size matters. These systematic errors exist in popular software even when the assumptions made about data are true. We have previously shown that bias occurs for scaffolders in genome assembly. Here, we generalize the theory and demonstrate that it is applicable in other contexts. We provide examples of bias in state-of the-art software and improve them using our model. One key application of our theory is structural variation detection using read pairs. We show that an incorrect null-hypothesis is commonly used in popular tools and can be corrected using our theory. Furthermore, we approximate the smallest size of indels that are possible to discover given an insert size distribution. Two other applications are inference of insert size distribution on \emph{de novo} genome assemblies and error correction of genome assemblies using mated reads. Our theory is implemented in a tool called GetDistr (\url{https://github.com/ksahlin/GetDistr})....

Alternative Titles

Full title

Correcting bias from stochastic insert size in read pair data applications to structural variation detection and genome assembly

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2070823308

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2070823308

Other Identifiers

ISSN

2692-8205

E-ISSN

2692-8205

DOI

10.1101/023929

How to access this item