It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

AI in healthcare
Published: arXiv: 2601.00090v1
Authors

Anne Harrington A. Sophia Koepke Shyamgopal Karthik Trevor Darrell Alexei A. Efros

Abstract

Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. While previous work has attempted to address this issue by steering the model using guidance mechanisms, or by generating a large pool of candidates and refining them, in this work we take a different direction and aim for diversity in generations via noise optimization. Specifically, we show that a simple noise optimization objective can mitigate mode collapse while preserving the fidelity of the base model. We also analyze the frequency characteristics of the noise and show that alternative noise initializations with different frequency profiles can improve both optimization and search. Our experiments demonstrate that noise optimization yields superior results in terms of generation quality and variety.

Paper Summary

Problem
Mode collapse is a common issue in text-to-image models, where they produce nearly identical images when given the same text prompt. This is a problem because it limits the diversity of generated images and makes them less useful for tasks that require a range of possible outputs.
Key Innovation
The authors of this paper propose a new approach to addressing mode collapse, called noise optimization. Instead of trying to steer the model towards generating varied samples or generating a large pool of candidates and refining them, they directly optimize the input noise to satisfy desired properties. This allows them to maximize the diversity in sets of generated images.
Practical Impact
The practical impact of this research is that it can be applied to improve the diversity of generated images in text-to-image models. This can be useful in a wide range of applications, such as image generation for art, advertising, or education. By generating a more diverse range of images, these models can be more useful for tasks that require a range of possible outputs.
Analogy / Intuitive Explanation
Think of noise optimization like a game of "find the best combination" where the goal is to find the combination of initial noise inputs that will produce the most diverse set of images. Just as a musician might experiment with different chord progressions to create a unique sound, the noise optimization algorithm searches through different combinations of noise inputs to find the one that produces the most diverse set of images.
Paper Information
Categories:
cs.CV cs.LG
Published Date:

arXiv ID:

2601.00090v1

Quick Actions