Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
About this item
Full title
Author / Creator
Publisher
Ithaca: Cornell University Library, arXiv.org
Journal title
Language
English
Formats
Publication information
Publisher
Ithaca: Cornell University Library, arXiv.org
Subjects
More information
Scope and Contents
Contents
Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios. Our benchmark targets three impor...
Alternative Titles
Full title
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Authors, Artists and Contributors
Author / Creator
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2880592668
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2880592668
Other Identifiers
E-ISSN
2331-8422