Lu, S., Liu, Y., & Kong, A. W.-K. (2023, October 1). TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv51070.2023.00218
Abstract:
Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for crossdomain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform crossdomain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information,
to facilitate text-driven diffusion models in accurately inverting real images into latent representations,
forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TFICON surpasses prior baselines in versatile visual domains. Code is available at https://github.com/Shilin-LU/TF-ICON
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the National Research Foundation - Strategic Capability Research Centres Funding Initiative
Grant Reference no. : N.A