Frequency-Aware Flow Matching for High-Quality Image Generation

AI in healthcare
Published: arXiv: 2604.15521v1
Authors

Sucheng Ren Qihang Yu Ju He Xiaohui Shen Alan Yuille Liang-Chieh Chen

Abstract

Flow matching models have emerged as a powerful framework for realistic image generation by learning to reverse a corruption process that progressively adds Gaussian noise. However, because noise is injected in the latent domain, its impact on different frequency components is non-uniform. As a result, during inference, flow matching models tend to generate low-frequency components (global structure) in the early stages, while high-frequency components (fine details) emerge only later in the reverse process. Building on this insight, we propose Frequency-Aware Flow Matching (FreqFlow), a novel approach that explicitly incorporates frequency-aware conditioning into the flow matching framework via time-dependent adaptive weighting. We introduce a two-branch architecture: (1) a frequency branch that separately processes low- and high-frequency components to capture global structure and refine textures and edges, and (2) a spatial branch that synthesizes images in the latent domain, guided by the frequency branch's output. By explicitly integrating frequency information into the generation process, FreqFlow ensures that both large-scale coherence and fine-grained details are effectively modeled low-frequency conditioning reinforces global structure, while high-frequency conditioning enhances texture fidelity and detail sharpness. On the class-conditional ImageNet-256 generation benchmark, our method achieves state-of-the-art performance with an FID of 1.38, surpassing the prior diffusion model DiT and flow matching model SiT by 0.79 and 0.58 FID, respectively. Code is available at https://github.com/OliverRensu/FreqFlow.

Paper Summary

Problem
The main problem addressed in this research paper is the limitation of existing flow matching models in generating high-quality images. These models inject noise uniformly across the spatial domain, leading to suboptimal preservation of frequency components and affecting the quality of generated images. Specifically, the paper highlights that flow matching models tend to generate low-frequency components (global structure) in the early stages, while high-frequency components (fine details) emerge only later in the reverse process.
Key Innovation
The key innovation of this work is the proposal of Frequency-Aware Flow Matching (FreqFlow), a novel approach that explicitly incorporates frequency-aware conditioning into the flow matching framework via time-dependent adaptive weighting. FreqFlow introduces a two-branch architecture that separates low- and high-frequency components to capture global structure and refine textures and edges.
Practical Impact
The practical impact of this research is significant, as it can be applied to various image generation tasks, such as generating high-quality images, videos, and even 3D models. The proposed FreqFlow approach can be used to improve the quality of generated images by effectively modeling both large-scale coherence and fine-grained details. This can have various applications in fields such as computer vision, graphics, and artificial intelligence.
Analogy / Intuitive Explanation
Imagine trying to paint a beautiful landscape. If you start by adding too much detail too early, the painting may look messy and unclear. However, if you start by capturing the overall shapes and colors of the landscape, and then gradually add more details, the painting will look more realistic and beautiful. Similarly, FreqFlow works by first capturing the global structure of an image (low-frequency components) and then gradually refining the textures and edges (high-frequency components) to create a high-quality image.
Paper Information
Categories:
cs.CV
Published Date:

arXiv ID:

2604.15521v1

Quick Actions