PaLI-3 Vision Language Models: Smaller, Faster, Stronger
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
About this item
Full title
Author / Creator
Chen, Xi , Wang, Xiao , Beyer, Lucas , Kolesnikov, Alexander , Wu, Jialin , Voigtlaender, Paul , Mustafa, Basil , Goodman, Sebastian , Ibrahim Alabdulmohsin , Padlewski, Piotr , Salz, Daniel , Xiong, Xi , Vlasic, Daniel , Pavetic, Filip , Rong, Keran , Yu, Tianli , Keysers, Daniel , Zhai, Xiaohua and Soricut, Radu
Publisher
Ithaca: Cornell University Library, arXiv.org
Journal title
Language
English
Formats
Publication information
Publisher
Ithaca: Cornell University Library, arXiv.org
Subjects
More information
Scope and Contents
Contents
This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrained using classification objectives to contrastively (SigLIP) pretrained ones. We find that, while sl...
Alternative Titles
Full title
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Authors, Artists and Contributors
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2878323455
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2878323455
Other Identifiers
E-ISSN
2331-8422