Integrative approaches to improve the informativeness of deep learning models for human complex dise...
Integrative approaches to improve the informativeness of deep learning models for human complex diseases
About this item
Full title
Author / Creator
Publisher
Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Journal title
Language
English
Formats
Publication information
Publisher
Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Subjects
More information
Scope and Contents
Contents
Deep learning models have achieved great success in predicting genome-wide regulatory effects from DNA sequence, but recent work has reported that SNP annotations derived from these predictions contribute limited unique information for human complex disease. Here, we explore three integrative approaches to improve the disease informativeness of allelic-effect annotations (predicted difference between reference and variant alleles) constructed using several previously trained deep learning models: DeepSEA, Basenji and DeepBind (and a related machine learning model, deltaSVM). First, we employ gradient boosting to learn optimal combinations of deep learning annotations, using fine-mapped SNPs and matched control SNPs (on held-out chromosomes) for training. Second, we improve the specificity of these annotations by restricting them to SNPs implicated by (proximal and distal) SNP-to-gene (S2G) linking strategies, e.g. prioritizing SNPs involved in gene regulation. Third, we predict gene expression (and derive allelic-effect annotations) from deep learning annotations at SNPs implicated by S2G linking strategies | generalizing the previously proposed ExPecto approach, which in-corporates deep learning annotations based on distance to TSS. We evaluated these approaches using stratified LD score regression, using functional data in blood and focusing on 11 autoimmune diseases and blood-related traits (average N=306K). We determined that the three approaches produced SNP annotations that were uniquely informative for these diseases/traits, despite the fact that linear combinations of the underlying DeepSEA, Basenji, DeepBind and deltaSVM blood annotations were not uniquely informative for these diseases/traits. Our results highlight the benefits of integrating SNP annotations produced by deep learning models with other types of data, including data linking SNPs to genes. Competing Interest Statement The authors have declared no competing interest. Footnotes * Following reviewer response, we have expanded the set of models from 2 deep learning models (DeepSEA and Basenji) to 4 deep learning/machine learning-based sequence models (DeepSEA, Basenji, DeepBind, deltaSVM). We have also updated the text to clarify the comparisons across methods and the features underlying the performance of these methods in greater detail. * https://github.com/kkdey/Imperio * https://alkesgroup.broadinstitute.org/LDSCORE/DeepLearning/Dey_DeepBoost_Imperio/...
Alternative Titles
Full title
Integrative approaches to improve the informativeness of deep learning models for human complex diseases
Authors, Artists and Contributors
Author / Creator
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2508591880
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2508591880
Other Identifiers
E-ISSN
2692-8205
DOI
10.1101/2020.09.08.288563
How to access this item
https://www.proquest.com/docview/2508591880?pq-origsite=primo&accountid=13902