Detecting Data Contamination in LLMs via In-Context Learning

Generative AI & LLMs
Published: arXiv: 2510.27055v1
Authors

Michał Zawalski Meriem Boubdir Klaudia Bałazy Besmira Nushi Pablo Ribalta

Abstract

We present Contamination Detection via Context (CoDeC), a practical and accurate method to detect and quantify training data contamination in large language models. CoDeC distinguishes between data memorized during training and data outside the training distribution by measuring how in-context learning affects model performance. We find that in-context examples typically boost confidence for unseen datasets but may reduce it when the dataset was part of training, due to disrupted memorization patterns. Experiments show that CoDeC produces interpretable contamination scores that clearly separate seen and unseen datasets, and reveals strong evidence of memorization in open-weight models with undisclosed training corpora. The method is simple, automated, and both model- and dataset-agnostic, making it easy to integrate with benchmark evaluations.

Paper Summary

Problem
Detecting data contamination in large language models (LLMs) is a significant challenge. LLMs are trained on vast amounts of data, but this data can be contaminated with biased, outdated, or incorrect information. If not detected, this contamination can lead to poor performance, unfair outcomes, and a lack of trust in the model. Existing methods for detecting contamination are often complex, require access to training data, and may not produce reliable results.
Key Innovation
The researchers introduce a new method called Contamination Detection via Context (CoDeC), which detects data contamination in LLMs using in-context learning. CoDeC measures how in-context examples from a dataset affect the model's predictions, distinguishing between datasets the model has memorized and those it has not. This approach is simple, automated, and model- and dataset-agnostic, making it easy to integrate with benchmark evaluations.
Practical Impact
CoDeC has the potential to significantly improve the integrity of LLM evaluations. By detecting contamination, researchers and developers can identify and address biases, outdated information, and other issues that can affect model performance. This can lead to more accurate and reliable models, which can have a positive impact on various applications, such as language translation, sentiment analysis, and text summarization. Additionally, CoDeC can help to increase trust in LLMs, which is essential for their widespread adoption.
Analogy / Intuitive Explanation
Imagine you're trying to recognize a friend in a crowded room. If you're familiar with your friend's face, you can easily pick them out, even in a crowded room. But if you're not familiar with their face, you might struggle to recognize them, even if they're standing right in front of you. Similarly, LLMs respond differently to in-context examples depending on whether they've seen the data before. CoDeC uses this insight to detect contamination by measuring how in-context examples affect the model's predictions. If the model's predictions improve when given in-context examples from a dataset, it's likely that the model has memorized the data. If the predictions don't improve, it may indicate contamination.
Paper Information
Categories:
cs.CL cs.AI I.2.7
Published Date:

arXiv ID:

2510.27055v1

Quick Actions