Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network

Page view(s)
62
Checked on Apr 22, 2025
Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network
Title:
Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network
Journal Title:
Pattern Recognition Letters
Publication Date:
27 March 2020
Citation:
Kenan E. Ak, Joo Hwee Lim, Jo Yew Tham, Ashraf A. Kassim, Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network, Pattern Recognition Letters, Volume 135, 2020, Pages 22-29, ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2020.02.030.
Abstract:
Recent advancements in Generative Adversarial Networks (GANs) have led to significant improvements in various image generation tasks including image synthesis based on text descriptions. In this paper, we present an enhanced Attentional Generative Adversarial Network (e-AttnGAN) with improved training stability for text-to-image synthesis. e-AttnGAN’s integrated attention module utilizes both sentence and word context features and performs feature-wise linear modulation (FiLM) to fuse visual and natural language representations. In addition to multimodal similarity learning for text and image features of AttnGAN [1], similarity and feature matching losses between real and generated images are included while employing classification losses for “significant attributes”. In order to improve the stability of the training and solve the mode collapse issue, spectral normalization and two-time scale update rule are used for the discriminator together with instance noise. Our experiments show that e-AttnGAN outperforms state-of-the-art methods using the FashionGen and DeepFashion-Synthesis datasets in terms of inception score, R-precision and classification accuracy. A detailed ablation study has been conducted to observe the effect of each component.
License type:
http://creativecommons.org/licenses/by-nc-nd/4.0/
Funding Info:
There was no specific funding for research done.
Description:
ISSN:
0167-8655
1872-7344
Files uploaded:

File Size Format Action
post-print.pdf 1.47 MB PDF Open