Log in to save to my catalogue

A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned At...

A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned At...

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3068911541

A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention

About this item

Full title

A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2024-10

Language

English

Formats

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

More information

Scope and Contents

Contents

In modern large language models (LLMs), increasing the context length is crucial for improving comprehension and coherence in long-context, multi-modal, and retrieval-augmented language generation. While many recent transformer models attempt to extend their context length over a million tokens, they remain impractical due to the quadratic time and...

Alternative Titles

Full title

A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_3068911541

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3068911541

Other Identifiers

E-ISSN

2331-8422

How to access this item