Learning Cross-Modal Representations for Language-Based Image Manipulation

Learning Cross-Modal Representations for Language-Based Image Manipulation
Learning Cross-Modal Representations for Language-Based Image Manipulation
Other Titles:
IEEE International Conference on Image Processing
Publication Date:
30 September 2020
In this paper, we propose a generative architecture for manipulating images/scenes with natural language descriptions. This is a challenging task as the generative network is expected to perform the given text instruction without changing the non-affiliating contents of the input image. Two main drawbacks of the existing methods are their limitation of performing changes that would affect only a limited region and the inability of handling complex instructions. The proposed approach, designed to address these limitations initially uses two sets of networks to extract the image and text features respectively. Rather than a simple combination of these two modalities during the image manipulation process, we use an improved technique to compose image and text features. Additionally, the generative network utilizes similarity learning to improve text manipulation which also enforces only the text-relevant changes on the input image. Our experiments on CSS and Fashion Synthesis datasets show that the proposed approach performs remarkably well and outperforms the baseline frameworks in terms of R-precision and FID.
License type:
Funding Info:
This research is supported by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (Project A18A2b0046).
“© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”
Files uploaded:

File Size Format Action
icip-2020-text-based-image-manipulation.pdf 504.62 KB PDF Open