DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

Generative AI & LLMs
Published: arXiv: 2511.23377v1
Authors

Rui Zhang Hongxia Wang Hangqing Liu Yang Zhou Qiang Zeng

Abstract

Diffusion-based image editing has made semantic level image manipulation easy for general users, but it also enables realistic local forgeries that are hard to localize. Existing benchmarks mainly focus on the binary detection of generated images or the localization of manually edited regions and do not reflect the properties of diffusion-based edits, which often blend smoothly into the original content. We present Diffusion-Based Image Editing Area Localization Dataset (DEAL-300K), a large scale dataset for diffusion-based image manipulation localization (DIML) with more than 300,000 annotated images. We build DEAL-300K by using a multi-modal large language model to generate editing instructions, a mask-free diffusion editor to produce manipulated images, and an active-learning change detection pipeline to obtain pixel-level annotations. On top of this dataset, we propose a localization framework that uses a frozen Visual Foundation Model (VFM) together with Multi Frequency Prompt Tuning (MFPT) to capture both semantic and frequency-domain cues of edited regions. Trained on DEAL-300K, our method reaches a pixel-level F1 score of 82.56% on our test split and 80.97% on the external CoCoGlide benchmark, providing strong baselines and a practical foundation for future DIML research.The dataset can be accessed via https://github.com/ymhzyj/DEAL-300K.

Paper Summary

Problem
The rise of Artificial Intelligence Generated Content (AIGC) technologies has made it easy for anyone to edit images using simple language instructions. However, this accessibility also raises concerns about the authenticity of content, particularly in the context of misinformation and digital forgeries. It becomes challenging to detect and localize the edited areas in images.
Key Innovation
The researchers present DEAL-300K, a large-scale dataset for diffusion-based image manipulation localization (DIML) with over 300,000 annotated images. They also propose a localization framework that uses a frozen Visual Foundation Model (VFM) together with Multi-Frequency Prompt Tuning (MFPT) to capture both semantic and frequency-domain cues of edited regions. This framework is trained on DEAL-300K and achieves strong performance in detecting and localizing edited areas.
Practical Impact
The DEAL-300K dataset and the proposed localization framework have several practical implications. Firstly, they provide a benchmark for evaluating the performance of image manipulation localization models. Secondly, they can be used to detect and prevent digital forgeries and misinformation. Finally, they can be extended to include video manipulations, making them more applicable to real-world scenarios.
Analogy / Intuitive Explanation
Imagine you are trying to find a small edit in a picture, like a single pixel that has been changed. It's like trying to find a needle in a haystack. The DEAL-300K dataset and the proposed localization framework are like a super-powerful magnifying glass that can help you find that needle. They use a combination of visual knowledge and frequency information to accurately locate the edited areas in images, making it easier to detect and prevent digital forgeries and misinformation.
Paper Information
Categories:
cs.CV
Published Date:

arXiv ID:

2511.23377v1

Quick Actions