Convolutions are competitive with transformers for protein sequence pretraining
Convolutions are competitive with transformers for protein sequence pretraining
About this item
Full title
Author / Creator
Publisher
Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Journal title
Language
English
Formats
Publication information
Publisher
Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Subjects
More information
Scope and Contents
Contents
Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks, and are now routinely integrated into bioinformatics tools. However, these models largely rely on the Transformer architecture, which scales quadratically with sequence length in both run-time and memory. Therefore, state-of-the-art models have limitations on sequence length. To address this limitation, we investigated if convolutional neural network (CNN) architectures, which scale linearly with sequence length, could be as effective as transformers in protein language models. With masked language model pretraining, CNNs are competitive to and occasionally superior to Transformers across downstream applications while maintaining strong performance on sequences longer than those allowed in the current state-of-the-art Transformer models. Our work suggests that computational efficiency can be improved without sacrificing performance simply by using a CNN architecture instead of a Transformer, and emphasizes the importance of disentangling pretraining task and model architecture.Competing Interest StatementThe authors have declared no competing interest.Footnotes* Add more experiments; restructure sections.* https://doi.org/10.5281/zenodo.6368483...
Alternative Titles
Full title
Convolutions are competitive with transformers for protein sequence pretraining
Authors, Artists and Contributors
Author / Creator
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2667083068
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2667083068
Other Identifiers
DOI
10.1101/2022.05.19.492714
How to access this item
https://www.proquest.com/docview/2667083068?pq-origsite=primo&accountid=13902