Log in to save to my catalogue

Evaluation of critical data processing steps for reliable prediction of gene co-expression from larg...

Evaluation of critical data processing steps for reliable prediction of gene co-expression from larg...

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2504969643

Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data

About this item

Full title

Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data

Author / Creator

Publisher

Cold Spring Harbor: Cold Spring Harbor Laboratory Press

Journal title

bioRxiv, 2021-08

Language

English

Formats

Publication information

Publisher

Cold Spring Harbor: Cold Spring Harbor Laboratory Press

More information

Scope and Contents

Contents

Motivation: Gene co-expression analysis is an attractive tool for leveraging enormous amounts of public RNA-seq datasets for the prediction of gene functions and regulatory mechanisms. However, the optimal data processing steps for the accurate prediction of gene co-expression from such large datasets remain unclear. Especially the importance of batch effect correction is understudied. Results: We processed RNA-seq data of 68 human and 76 mouse cell types and tissues using 50 different workflows into 7,200 genome-wide gene co-expression networks. We then conducted a systematic analysis of the factors that result in high-quality co-expression predictions, focusing on normalization, batch effect correction, and measure of correlation. We confirmed the key importance of high sample counts for high-quality predictions. However, choosing a suitable normalization approach and applying batch effect correction can further improve the quality of co-expression estimates, equivalent to a >80% and >40% increase in samples. In larger datasets, batch effect removal was equivalent to a more than doubling of the sample size. Finally, Pearson correlation appears more suitable than Spearman correlation, except for smaller datasets. Conclusion: A key point for accurate prediction of gene co-expression is the collection of many samples. However, paying attention to data normalization, batch effects, and the measure of correlation can significantly improve the quality of co-expression estimates. Competing Interest Statement The authors have declared no competing interest. Footnotes * - Made the goal of this study clearer - Separated data into a training set and validation set - Used paired t-test for validation (before: unpaired) - all figures and the table have been revised, although there are no large changes) * https://doi.org/10.6084/m9.figshare.14178446.v1 * https://doi.org/10.6084/m9.figshare.14178425.v1...

Alternative Titles

Full title

Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data

Authors, Artists and Contributors

Author / Creator

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2504969643

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2504969643

Other Identifiers

ISSN

2692-8205

E-ISSN

2692-8205

DOI

10.1101/2021.03.11.435043

How to access this item