Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models

Agentic AI
Published: arXiv: 2512.05887v1
Authors

Sairam Vaidya Marcel Böhme Loris D'Antoni

Abstract

Modern extensible compiler frameworks-such as MLIR-enable rapid creation of domain-specific language dialects. This flexibility, however, makes correctness harder to ensure as the same extensibility that accelerates development also complicates maintaining the testing infrastructure. Extensible languages require automated test generation that is both dialect-agnostic (works across dialects without manual adaptation) and dialect-effective (targets dialect-specific features to find bugs). Existing approaches typically sacrifice one of these goals by either requiring manually constructed seed corpora for each dialect, or by failing to be effective. We present a dialect-agnostic and dialect-effective grammar-based and coverage-guided fuzzing approach for extensible compilers that combines two key insights from existing work: (i) the grammars of dialects, which already encode the structural and type constraints, can often be extracted automatically from the dialect specification; and (ii) these grammars can be used in combination with pre-trained large language models to automatically generate representative and diverse seed inputs from the full dialect space without requiring any manual input or training data. These seeds can then be used to bootstrap coverage-guided fuzzers. We built this approach into a tool, Germinator. When evaluated on six MLIR projects spanning 91 dialects, Germinator generated seeds improve line coverage by 10-120% over grammar-based baselines. We compare against grammar-based baselines because they are the only class of existing automatic seed generators that can be applied uniformly across MLIR's heterogeneous dialect ecosystem. Germinator discovers 88 previously unknown bugs (40 confirmed), including 23 in dialects with no prior automated test generators, demonstrating effective and controllable testing of low-resource dialects at scale.

Paper Summary

Problem
The problem this research paper addresses is the challenge of testing and validating compilers for custom language dialects. With the rise of extensible compiler frameworks, developers can easily create new dialects, but this extensibility introduces a critical bottleneck: new dialects often ship with limited and incomplete test suites, leaving custom dialect features largely untested. This can lead to correctness bugs and silent propagation of failures into production systems.
Key Innovation
The key innovation of this work is a dialect-agnostic and dialect-effective grammar-based and coverage-guided fuzzing approach that combines two key insights from existing work: (i) the grammars of dialects can be automatically extracted from the dialect specification, and (ii) these grammars can be used in combination with pre-trained large language models to generate representative and diverse seed inputs from the full dialect space. This approach is implemented in a tool called Germinator.
Practical Impact
This research has significant practical impact in the field of compiler development and testing. By generating representative and diverse seed inputs, Germinator can improve line coverage by 10-120% over grammar-based baselines. This means that developers can more effectively test and validate their compilers, reducing the risk of correctness bugs and improving the overall quality of their software. The tool has already uncovered 88 previously unknown bugs across MLIR dialects, demonstrating its effectiveness in real-world scenarios.
Analogy / Intuitive Explanation
Imagine building a house with custom architectural plans. The plans describe the layout, materials, and design of the house, but they don't provide a complete blueprint for the entire structure. A compiler is like a construction worker who follows the plans to build the house, but with extensible compiler frameworks, the plans can be modified and extended on the fly. The problem is that the modified plans may not be thoroughly tested, leading to potential errors and defects in the final product. Germinator is like a quality control inspector who reviews the plans, identifies potential issues, and generates a set of representative inputs to test the construction process, ensuring that the final product meets the required standards.
Paper Information
Categories:
cs.SE cs.LG cs.PL
Published Date:

arXiv ID:

2512.05887v1

Quick Actions