Hu P, Peng X, Zhu H, et al. Cross-modal Discriminant Adversarial Network[J]. Pattern Recognition, 2021
Cross-modal retrieval aims at retrieving relevant points across different modalities, such as retrieving images via texts. One key challenge of cross-modal retrieval is narrowing the heterogeneous gap across diverse modalities. To overcome this challenge, we propose a novel method termed as Cross-modal discriminant Adversarial Network (CAN). Taking bi-modal data as a showcase, CAN consists of two parallel modality-specific generators, two modality-specific discriminators, and a Cross-modal Discriminant Mechanism (CDM). To be specific, the generators project diverse modalities into a latent cross-modal discriminant space. Meanwhile, the discriminators compete against the generators to alleviate the heterogeneous discrepancy in this space, i.e., the generators try to generate unified features to confuse the discriminators, and the discriminators aim to classify the generated results. To further remove the redundancy and preserve the discrimination, we propose CDM to project the generated results into a single common space, accompanying with a novel eigenvalue-based loss. Thanks to the eigenvalue-based loss, CDM could push as much discriminative power as possible into all latent directions. To demonstrate the effectiveness of our CAN, comprehensive experiments are conducted on four multimedia datasets comparing with 15 state-of-the-art approaches.
This work was supported in part by the National Natural Science Foundation of China under Grants U19A2078 , 61971296 , 61625204 , 61836011 , and 61806135 ; Sichuan Science and Technol- ogy PlanningProjectsunderGrant2020YFG0319 , 2020YFH0186 , and 2019YFG0495 ; the Fundamental Research Funds for the Cen- tral Universities under Grant YJ201949 ; and the Agency for Science, Technology and Research (A ∗STAR) under its AME Programmatic Funds (Project no. A1892b0026 ).