Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval

Page view(s)
15
Checked on Dec 03, 2024
Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval
Title:
Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval
Journal Title:
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Keywords:
Publication Date:
26 July 2024
Citation:
Tian, L., Yang, Z., Hu, Z., Li, H., Yin, Y., & Wang, Z. (2024). Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 1335–1343. https://doi.org/10.24963/ijcai.2024/148
Abstract:
The rise of online shopping and social media has spurred the Video-to-Shop Retrieval (VSR) task, which involves identifying fashion items (e.g., clothing) in videos and matching them with identical products provided by stores. In real-world scenarios, human movement in dynamic video scenes can cause substantial morphological alterations of fashion items with aspects of occlusion, shifting viewpoints (parallax), and partial visibility (truncation). This results in those high-quality frames being overwhelmed by a vast of redundant ones, which makes the retrieval less effectiveness. To this end, this paper introduces a framework, named Self-supervised Fashion-aware CLIP (SF-CLIP), for effective VSR. The SF-CLIP enables the discovery of salient frames with high fashion expressiveness via generating pseudo-labels from three key aspects of fashion expressiveness to assess occlusion, parallax, and truncation. With such pseudo-labels, the ability of CLIP is expanded to facilitate the discovery of salient frames. Furthermore, to encompass the comprehensive representations among salient frames, a dual-branch graph-based fusion module is proposed to extract and integrate inter-frame features. Extensive experiments demonstrate the superiority of SF-CLIP over the state-of-the-arts.
License type:
Publisher Copyright
Funding Info:
The Supercomputing Center of Wuhan University supports the supercomputing resource.

This research / project is supported by the National Natural Science Foundation of China - N/A
Grant Reference no. : 62171325

This research / project is supported by the Hubei Key R&D Project - N/A
Grant Reference no. : 2022BAA033
Description:
© 2024 International Joint Conferences on Artificial Intelligence All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
ISBN:
978-1-956792-04-1