Abstract: As a pioneering vision-language model, CLIP (Contrastive Language-Image Pre-training) has achieved significant success across various domains and a wide range of downstream vision-language ...
Abstract: Text-to-Image Person Retrieval (TIPR) aims to utilize natural language descriptions as queries to retrieve pedestrian images. However, existing methods only concentrated on aligning ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results