Agentic Jackal: Live Execution and Semantic Value Grounding for Text-to-JQL

Agentic AI
Published: arXiv: 2604.09470v1
Authors

Vishnu Murali Anmol Gulati Elias Lumer Kevin Frank Sindy Campagna Vamse Kumar Subbiah

Abstract

Translating natural language into Jira Query Language (JQL) requires resolving ambiguous field references, instance-specific categorical values, and complex Boolean predicates. Single-pass LLMs cannot discover which categorical values (e.g., component names or fix versions) actually exist in a given Jira instance, nor can they verify generated queries against a live data source, limiting accuracy on paraphrased or ambiguous requests. No open, execution-based benchmark exists for mapping natural language to JQL. We introduce Jackal, the first large-scale, execution-based text-to-JQL benchmark comprising 100,000 validated NL-JQL pairs on a live Jira instance with over 200,000 issues. To establish baselines on Jackal, we propose Agentic Jackal, a tool-augmented agent that equips LLMs with live query execution via the Jira MCP server and JiraAnchor, a semantic retrieval tool that resolves natural-language mentions of categorical values through embedding-based similarity search. Among 9 frontier LLMs evaluated, single-pass models average only 43.4% execution accuracy on short natural-language queries, highlighting that text-to-JQL remains an open challenge. The agentic approach improves 7 of 9 models, with a 9.0% relative gain on the most linguistically challenging variant; in a controlled ablation isolating JiraAnchor, categorical-value accuracy rises from 48.7% to 71.7%, with component-field accuracy jumping from 16.9% to 66.2%. Our analysis identifies inherent semantic ambiguities, such as issue-type disambiguation and text-field selection, as the dominant failure modes rather than value-resolution errors, pointing to concrete directions for future work. We publicly release the benchmark, all agent transcripts, and evaluation code to support reproducibility.

Paper Summary

Problem
The paper addresses the challenge of translating natural language into Jira Query Language (JQL) to query a project management platform like Jira. Current Large Language Models (LLMs) struggle to accurately generate JQL queries, especially when faced with ambiguous or paraphrased user requests. This limitation hinders the effectiveness of natural language interfaces for non-expert users of Jira.
Key Innovation
The researchers introduce Agentic Jackal, a tool-augmented multi-step agent that equips LLMs with live JQL execution and iterative query refinement. Agentic Jackal uses JiraAnchor, a novel semantic field-value retrieval tool, to resolve natural language mentions of categorical values against a live Jira instance. This approach improves the accuracy of LLMs in generating JQL queries, especially on the most linguistically challenging variants.
Practical Impact
The Agentic Jackal approach has significant practical implications for the development of natural language interfaces for Jira and other project management platforms. By improving the accuracy of LLMs in generating JQL queries, Agentic Jackal enables users to more effectively query and analyze their projects, leading to better decision-making and productivity. The approach also highlights the importance of integrating feedback loops and semantic value grounding in LLMs to improve their performance on complex tasks like text-to-JQL.
Analogy / Intuitive Explanation
Imagine trying to describe a recipe to a friend, but you're using vague terms like "spicy" or "sweet" instead of specific ingredients. In the same way, users of Jira may try to describe a query using natural language, but the LLMs struggle to accurately translate that language into JQL. Agentic Jackal is like having a personal chef who can take your vague recipe description and refine it into a precise set of instructions, using a combination of live query execution and semantic value grounding to ensure accuracy.
Paper Information
Categories:
cs.CL
Published Date:

arXiv ID:

2604.09470v1

Quick Actions