Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation

Page view(s)
4
Checked on Oct 30, 2024
Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
Title:
Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
Journal Title:
European Conference on Computer Vision (ECCV) 2024
Keywords:
Publication Date:
05 October 2024
Citation:
Yang, X. et al. (2025). Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15102. Springer, Cham. https://doi.org/10.1007/978-3-031-72784-9_8
Abstract:
In this paper, we propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD loss and CSD loss. Code:https://yangxiaofeng.github.io/demo_diffusion_prior
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - MTC Programmatic Funds
Grant Reference no. : M23L7b0021

This research is also partly supported by an OPPO research grant.
Description:
This is a post-peer-review, pre-copyedit version of an article published in Computer Vision – ECCV 2024. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-3-031-72784-9_8
ISBN: