Log in to save to my catalogue

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2819552158

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

About this item

Full title

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2024-01

Language

English

Formats

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

More information

Scope and Contents

Contents

Composed image retrieval aims to find an image that best matches a given multi-modal user query consisting of a reference image and text pair. Existing methods commonly pre-compute image embeddings over the entire corpus and compare these to a reference image embedding modified by the query text at test time. Such a pipeline is very efficient at test time since fast vector distances can be used to evaluate candidates, but modifying the reference image embedding guided only by a short textual description can be difficult, especially independent of potential candidates. An alternative approach is to allow interactions between the query and every possible candidate, i.e., reference-text-candidate triplets, and pick the best from the entire set. Though this approach is more discriminative, for large-scale datasets the computational cost is prohibitive since pre-computation of candidate embeddings is no longer possible. We propose to combine the merits of both schemes using a two-stage model. Our first stage adopts the conventional vector distancing metric and performs a fast pruning among candidates. Meanwhile, our second stage employs a dual-encoder architecture, which effectively attends to the input triplet of reference-text-candidate and re-ranks the candidates. Both stages utilize a vision-and-language pre-trained network, which has proven beneficial for various downstream tasks. Our method consistently outperforms state-of-the-art approaches on standard benchmarks for the task. Our implementation is available at https://github.com/Cuberick-Orion/Candidate-Reranking-CIR....

Alternative Titles

Full title

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2819552158

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2819552158

Other Identifiers

E-ISSN

2331-8422

How to access this item