A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned At...
A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention
About this item
Full title
Author / Creator
Publisher
Ithaca: Cornell University Library, arXiv.org
Journal title
Language
English
Formats
Publication information
Publisher
Ithaca: Cornell University Library, arXiv.org
Subjects
More information
Scope and Contents
Contents
In modern large language models (LLMs), increasing the context length is crucial for improving comprehension and coherence in long-context, multi-modal, and retrieval-augmented language generation. While many recent transformer models attempt to extend their context length over a million tokens, they remain impractical due to the quadratic time and...
Alternative Titles
Full title
A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention
Authors, Artists and Contributors
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_3068911541
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3068911541
Other Identifiers
E-ISSN
2331-8422