Correcting bias from stochastic insert size in read pair data applications to structural variation d...
Correcting bias from stochastic insert size in read pair data applications to structural variation detection and genome assembly
About this item
Full title
Author / Creator
Publisher
Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Journal title
Language
English
Formats
Publication information
Publisher
Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Subjects
More information
Scope and Contents
Contents
Insert size distributions from paired read protocols are used for inference in bioinformatic applications such as genome assembly and structural variation detection. However, many of the models that are being used are subject to bias. This bias arises when we assume that all insert sizes within a distribution are equally likely to be observed, when in fact, size matters. These systematic errors exist in popular software even when the assumptions made about data are true. We have previously shown that bias occurs for scaffolders in genome assembly. Here, we generalize the theory and demonstrate that it is applicable in other contexts. We provide examples of bias in state-of the-art software and improve them using our model. One key application of our theory is structural variation detection using read pairs. We show that an incorrect null-hypothesis is commonly used in popular tools and can be corrected using our theory. Furthermore, we approximate the smallest size of indels that are possible to discover given an insert size distribution. Two other applications are inference of insert size distribution on \emph{de novo} genome assemblies and error correction of genome assemblies using mated reads. Our theory is implemented in a tool called GetDistr (\url{https://github.com/ksahlin/GetDistr})....
Alternative Titles
Full title
Correcting bias from stochastic insert size in read pair data applications to structural variation detection and genome assembly
Authors, Artists and Contributors
Author / Creator
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2070823308
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2070823308
Other Identifiers
ISSN
2692-8205
E-ISSN
2692-8205
DOI
10.1101/023929