Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Qu...
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
About this item
Full title
Author / Creator
Yu, Ting , Fu, Kunhao , Wang, Shuhui , Huang, Qingming and Yu, Jun
Publisher
Ithaca: Cornell University Library, arXiv.org
Journal title
Language
English
Formats
Publication information
Publisher
Ithaca: Cornell University Library, arXiv.org
Subjects
More information
Scope and Contents
Contents
Video Question Answering (VideoQA) represents a crucial intersection between video understanding and language processing, requiring both discriminative unimodal comprehension and sophisticated cross-modal interaction for accurate inference. Despite advancements in multi-modal pre-trained models and video-language foundation models, these systems of...
Alternative Titles
Full title
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Authors, Artists and Contributors
Author / Creator
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_3116749997
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3116749997
Other Identifiers
E-ISSN
2331-8422
DOI
10.48550/arxiv.2410.09380