Weekly AI Research Roundup - September 22, 2025

Published on 2025-09-22

15 papers

AI Research Roundup: December 21, 2025

Discover the latest breakthroughs in artificial intelligence with our curated selection of top cutting-edge research papers of this week.

15 Papers
4 Categories
63 Researchers

Agentic AI

Autonomous agents, multi-agent systems, and intelligent decision-making

1

AI Methods for Permutation Circuit Synthesis Across Generic Topologies

By Victor Villar, Juan Cruz-Benito, Ismael Faro et al. (4 authors)

Agentic AI 2025-09-19
ibm quantum, ibm research

Problem

The main problem addressed in this research paper is the challenge of quantum circuit transpilation, which is the process of transforming abstract quantum algorithms into equivalent circuits that adhere to the physical constraints of specific quantum processors. This process is computationally difficult and requires solving NP-hard optimization problems, making it impractical for large-scale quantum devices.

Analogy

Imagine trying to build a complex puzzle with many interconnected pieces. Traditional approaches to quantum circuit transpilation are like trying to solve the puzzle by looking at each piece individually, without considering how they fit together as a whole. The generalist approach developed in this research is like having a "puzzle solver" that can look at the entire puzzle and figure out the best way to assemble it, taking into account the connections between each piece. This allows for more efficient and optimal solutions, even for complex puzzles (or quantum circuits).

Key Innovation

The key innovation of this work is the development of a generalist approach to Reinforcement Learning (RL)-based quantum circuit transpilation for permutation circuits. Rather than training separate models for different device topologies, this approach trains a single model that can synthesize circuits across diverse topologies, allowing for efficient integration into transpilation workflows.

Practical Impact

This research has significant practical implications for the field of quantum computing. By enabling the synthesis of permutation circuits across generic topologies, this approach can be applied to a wide range of quantum devices, including those with complex connectivity constraints. This can lead to more efficient and scalable quantum computing architectures, which are essential for tackling complex problems in fields like chemistry, materials science, and cryptography.

2

Reward Evolution with Graph-of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning

By Changwei Yao, Xinzi Liu, Chen Li et al. (4 authors)

Agentic AI 2025-09-19

Problem

Designing effective reward functions for reinforcement learning (RL) is a major challenge. It requires human expertise and iterative refinement, making it time-consuming and prone to overfitting. Current approaches using Large Language Models (LLMs) have limitations, such as hallucinations, reliance on human feedback, and challenges with handling complex tasks.

Analogy

Imagine you're trying to teach a child to tie their shoes. A traditional approach would be to show them step-by-step instructions and provide feedback on their progress. However, with RE-GoT, the child (or in this case, the RL agent) can learn to break down the task into smaller, manageable steps and receive feedback in the form of visual demonstrations. This approach allows the child (or agent) to learn more efficiently and effectively, and to generalize their skills to new situations.

Key Innovation

The researchers introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention.

Practical Impact

RE-GoT has the potential to revolutionize the field of RL by providing a scalable and effective solution for autonomous reward evolution. It can be applied to various robotic manipulation tasks, such as picking and placing objects, and can be used to improve the performance and generalization of RL systems in complex environments. The approach can also be used to reduce the reliance on human supervision, making it a more efficient and effective way to design reward functions.

3

Accelerating Atomic Fine Structure Determination with Graph Reinforcement Learning

By M. Ding, V. -A. Darvariu, A. N. Ryabtsev et al. (5 authors)

Agentic AI 2025-09-19

Problem

Atomic data, such as the energies of energy levels in atoms, is crucial for understanding and predicting the behavior of matter in various fields like astronomy, fusion science, and lighting industries. However, determining these energies is a complex and time-consuming task that requires extensive human labor and expertise in atomic spectroscopy. The current process, known as term analysis, involves analyzing observed atomic spectra to extract energy level energies and transition wavenumbers.

Analogy

Imagine trying to solve a complex puzzle with many interconnected pieces. Each piece represents a spectral line, and the puzzle itself represents the energy level energies and transition wavenumbers that need to be determined. The AI agent in TAG-DQN is like a super-smart puzzle solver that uses machine learning to figure out the correct connections between the pieces, allowing it to quickly and accurately determine the energy level energies and transition wavenumbers.

Key Innovation

Researchers have proposed a new artificial intelligence (AI) approach to automate term analysis using graph reinforcement learning. This approach, called Term Analysis with Graph Deep Q-Network (TAG-DQN), involves casting the analysis procedure as a Markov decision process and solving it using a variant of the Deep Q-network algorithm. The AI agent learns to choose valid actions that maximize a reward function trained on human preferences from past analyses.

Practical Impact

The new AI approach has the potential to accelerate the determination of atomic fine structure, which is essential for various applications. By automating the term analysis process, researchers can rapidly develop fundamental atomic data that would otherwise take decades to obtain. This can lead to breakthroughs in atomic physics, astronomy, and fusion technology, and can help address the growing demands for atomic data from these fields.

4

Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences

By Andrew Kyle Lampinen, Martin Engelcke, Yuxuan Li et al. (5 authors)

Agentic AI 2025-09-19
google deepmind

Problem

Artificial intelligence (AI) systems, particularly those using deep learning, often struggle to generalize like natural intelligence. They fail to apply knowledge learned in one context to new, related situations. This problem is evident in language models that can't make simple generalizations, like reversing the relationships between people mentioned in a sentence.

Analogy

Imagine you're learning a new language by listening to a podcast. You might not understand every word or phrase, but you pick up on the rhythm and structure of the language. This is similar to latent learning, where you learn information that's not directly relevant to the current task (in this case, understanding the podcast) but might be useful for future tasks (like having a deeper understanding of the language). By exposing AI systems to diverse experiences and allowing them to learn from them in a flexible way, we can help them develop this type of latent learning.

Key Innovation

Researchers propose that the key to bridging this gap lies in the concept of "latent learning." Latent learning is the ability to learn information that's not directly relevant to the current task but might be useful for future tasks. The researchers suggest that AI systems lack this ability, instead only learning information that's directly relevant to the current task.

Practical Impact

If AI systems can be made to exhibit latent learning, they could become more flexible and adaptable in real-world applications. This could lead to significant improvements in areas like language understanding, decision-making, and problem-solving. For example, a language model that can learn from diverse experiences could better understand nuances of human language and make more accurate predictions.

Generative AI & LLMs

Breakthroughs in language models, text generation, and creative AI systems

1

Rethinking Molecule Synthesizability with Chain-of-Reaction

By Seul Lee, Karsten Kreis, Srimukh Prasad Veccham et al. (8 authors)

Generative AI & LLMs 2025-09-19

Problem

The main problem addressed in this paper is the limitation of molecular generative models in generating synthesizable molecules. These models often produce molecules that are not easily accessible through chemical synthesis, making them impractical for real-world applications such as drug discovery.

Analogy

Imagine trying to solve a complex puzzle, where each step requires a specific sequence of actions to reach the final solution. In the same way, ReaSyn uses the CoR notation to break down the complex process of chemical synthesis into a series of individual steps, allowing it to reason and generate synthesizable molecules more effectively. This analogy highlights the step-by-step nature of ReaSyn's reasoning process, which is a key innovation of this work.

Key Innovation

The innovation of this work lies in the introduction of ReaSyn, a generative framework for synthesizable projection that views synthetic pathways as chain-of-thought (CoT) reasoning paths. This is achieved through the use of a novel notation called chain-of-reaction (CoR), which explicitly states reactants, reaction types, and intermediate products for each step in a pathway. This allows ReaSyn to learn chemical reaction rules during supervised training and perform step-by-step reasoning.

Practical Impact

The practical impact of this research is significant, as it has the potential to accelerate the drug discovery process by generating synthesizable molecules that are more likely to be accessible through chemical synthesis. This could lead to the development of new and more effective treatments for various diseases. Additionally, the framework proposed in this paper can be applied to other fields where molecular generation is relevant, such as materials science and chemistry.

2

MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair

By Ali Reza Ibrahimzada, Brandon Paulsen, Reyhaneh Jabbarvand et al. (5 authors)

Generative AI & LLMs 2025-09-19
university of illinois urbana-champaign

Problem

Code translation, the process of converting source code from one programming language (PL) to another, is a crucial step in software modernization efforts. However, translating code manually, especially for large codebases, can be tedious, time-consuming, and error-prone. This problem is further complicated by the complexity of code structures and dependencies involved.

Analogy

Think of MatchFixAgent as a quality control system for translated code. Just as a manufacturing quality control system checks for defects and ensures that products meet quality standards, MatchFixAgent checks for semantic bugs and ensures that translated code is functionally equivalent to the original code. This analogy highlights the importance of MatchFixAgent in ensuring the reliability and maintainability of translated code.

Key Innovation

The researchers present MatchFixAgent, a novel language-agnostic framework for equivalence validation and repair of translations. MatchFixAgent combines the power of program analysis and large language model (LLM) agents to systematically generate targeted tests, enabling the demonstration of functional equivalence or detection of semantic bugs. This framework is designed to be cost-effective, scalable, and capable of supporting multiple programming languages.

Practical Impact

MatchFixAgent has the potential to revolutionize the code translation process by automating validation and repair tasks. This can save developers a significant amount of time and effort, reduce the likelihood of errors, and improve the overall quality of translated code. Additionally, MatchFixAgent can generate high-quality reports that can be used by end-users to better understand translated programs and the validation process.

3

FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation

By Luca Della Libera, Cem Subakan, Mirco Ravanelli

Generative AI & LLMs 2025-09-19

Problem

The main problem addressed in this research paper is the challenge of creating a neural audio codec (NAC) that can compress speech into a compact discrete representation at low bitrates while supporting real-time streaming inference. Current NACs are not streamable, limiting their use in applications such as speech assistants, interactive dialogue, and low-latency generation.

Analogy

Imagine trying to compress a high-quality video into a small file that can be sent over the internet. Current NACs are like trying to compress the video into a large file that takes too long to send, while FocalCodec-Stream is like compressing the video into a small file that can be sent quickly and efficiently. The codec uses a combination of techniques to achieve this, including a lightweight refiner module that helps to preserve the quality of the compressed video.

Key Innovation

The key innovation of this work is the introduction of FocalCodec-Stream, a hybrid codec that combines multi-stage causal distillation of WavLM with targeted architectural improvements, including a lightweight refiner module. This approach enables the codec to compress speech into a single binary codebook at low bitrates (0.55 - 0.80 kbps) while supporting streaming inference with a theoretical latency of 80 ms.

Practical Impact

The practical impact of this research is significant, as it enables the creation of efficient and real-time speech processing systems. FocalCodec-Stream can be used in various applications such as speech assistants, interactive dialogue, and low-latency generation, where fast and accurate speech processing is crucial. The codec's ability to preserve both semantic and acoustic information also makes it suitable for tasks such as speech language models (SLMs).

4

RaceGAN: A Framework for Preserving Individuality while Converting Racial Information for Image-to-Image Translation

By Mst Tasnim Pervin, George Bebis, Fang Jiang et al. (4 authors)

Generative AI & LLMs 2025-09-18

Problem

The main problem addressed by this research paper is the challenge of translating racial traits in images while maintaining individuality and high-level semantics. Current image-to-image translation models, such as CycleGAN and StarGAN, struggle to achieve this balance and often require additional reference images.

Analogy

Imagine you have a photo of a person's face, but you want to change their racial features to make them look like someone from a different ethnic group. Current image-to-image translation models are like a paint-by-numbers kit, where you have to follow a set of rules to achieve the desired result. But RaceGAN is like a skilled artist, who can take the original photo and create a new image that looks like the person from a different racial group, while still maintaining their individuality and high-level features.

Key Innovation

The key innovation of this paper is the introduction of RaceGAN, a novel framework that maps style codes over multiple domains during racial attribute translation. Unlike previous models, RaceGAN does not rely on a reference image and is able to maintain individuality and high-level semantics in the translated images. This is achieved through the use of a style extractor module, which extracts domain-specific low-level style codes from the latent space of multiple domains.

Practical Impact

The practical impact of this research is significant, as it has the potential to improve the accuracy and fairness of facial recognition systems, which are often biased towards certain racial groups. By enabling the translation of racial traits in images, RaceGAN could also be used to create more realistic and diverse datasets for training machine learning models, which would lead to better performance and reduced bias in these models. Additionally, this research could have applications in fields such as entertainment, education, and marketing, where the ability to manipulate facial features and racial traits could be useful.

5

Generating Part-Based Global Explanations Via Correspondence

By Kunal Rathore, Prasad Tadepalli

Generative AI & LLMs 2025-09-18
oregon state university

Problem

Deep learning models, such as those used in image classification, are often "black-box" systems that are difficult to understand and trust. While they have achieved impressive results in various fields, their complexity raises concerns about their interpretability. This is particularly important in human-interactive and safety-critical contexts, where understanding the decisions made by these models is crucial.

Analogy

Think of GEPC like a detective trying to solve a mystery. The detective has a limited set of clues (user-defined part labels) that they use to search for other clues (local explanations) that can help explain the case (model decision). By combining these clues and using a greedy set cover approach, the detective can piece together a global explanation that reveals the key parts of the case that led to the solution (model decision). This analogy illustrates how GEPC uses a combination of local and global explanations to provide a comprehensive understanding of complex models.

Key Innovation

Researchers have proposed a new approach called GEPC (Global Explanations via Part Correspondence) to address this problem. GEPC uses a combination of local explanation search, part correspondence, and greedy set cover to generate global symbolic explanations for model decisions. This approach leverages user-defined part labels from a limited set of images and efficiently transfers them to a larger dataset, enabling the generation of human-understandable explanations on a large scale.

Practical Impact

The practical impact of GEPC is significant. By providing global explanations for model decisions, GEPC enables users to understand what parts of an image are responsible for the model's classification, even when the model is complex and difficult to interpret. This is particularly important in safety-critical industries, such as self-driving cars, where understanding the decisions made by these models is crucial to ensuring public safety. Additionally, GEPC can be applied to various tasks, such as gene expression analysis, activity recognition in videos, and question answering from texts, making it a versatile tool for explaining complex models.

6

LoCaL: Countering Surface Bias in Code Evaluation Metrics

By Simantika Bhattacharjee Dristi, Matthew B. Dwyer

Generative AI & LLMs 2025-09-18
university of virginia

Problem

Code evaluation metrics (CEMs) are used to assess the quality of machine-generated code, but they often have a "surface bias" problem. This means that they prioritize code that looks similar to the original code, even if it doesn't actually work correctly. This can lead to code that is functionally incorrect being rated highly, which can cause problems in software development.

Analogy

Imagine you're trying to teach a child to write a simple sentence, like "hello world". A CEM that has a surface bias might give high marks to a sentence that looks similar to the original, but has a typo, like "hellos world". LoCaL is like a teacher who says, "no, that's not correct, let's try again". By focusing on functional correctness, LoCaL helps CEMs to give more accurate ratings and improve the quality of generated code.

Key Innovation

The researchers propose a new benchmark called LoCaL, which is designed to counter surface bias in CEMs. LoCaL uses a differential fuzzing-based strategy to generate thousands of test cases and compute reliable similarity scores for code pairs. This allows LoCaL to identify and highlight code pairs that have a significant gap between their surface similarity and functional similarity.

Practical Impact

LoCaL can be used to evaluate and improve CEMs, which can lead to better software development practices. By reducing the bias towards surface-level features, LoCaL can help developers create code that is more functional and less prone to errors. Additionally, LoCaL provides reusable ground-truth similarity scores for downstream tasks like code optimization, code clone detection, code refactoring, and automated bug repair.

Computer Vision & MultiModal AI

Advances in image recognition, video analysis, and multimodal learning

1

Blind-Spot Guided Diffusion for Self-supervised Real-World Denoising

By Shen Cheng, Haipeng Li, Haibin Huang et al. (5 authors)

Computer Vision & MultiModal AI 2025-09-19

Problem

Image denoising is a fundamental task in computer vision that involves recovering a clean image from a noisy observation. This is a challenging problem because both the image and noise components are unknown and difficult to disentangle. Traditional methods require paired noisy-clean images for training, but acquiring such data at scale is resource-intensive and often impractical. Self-supervised learning methods have emerged as a promising alternative, but they often struggle to preserve local details and introduce pixel discontinuities.

Analogy

Imagine you're trying to clean a dirty window. Traditional methods might use a single cleaning solution that works well on some parts of the window but not others. BSGD is like using a combination of cleaning solutions, one that focuses on removing dirt and grime (the BSN-based branch) and another that captures the underlying texture and pattern of the window (the conventional branch). By combining these two solutions, BSGD can effectively clean the window while preserving its original texture and pattern.

Key Innovation

The researchers propose a novel self-supervised framework called Blind-Spot Guided Diffusion (BSGD) that addresses the limitations of blind-spot networks (BSNs) and adapts diffusion models to self-supervised denoising. BSGD is a dual-branch diffusion framework that combines a BSN-based diffusion branch and a conventional diffusion branch. The BSN-based branch generates semi-clean images, while the conventional branch captures underlying noise distributions. The BSN-based branch is used to guide the sampling process, capturing noise structure while preserving local details.

Practical Impact

BSGD has the potential to revolutionize image denoising for real-world applications. By leveraging self-supervised learning, BSGD can be trained on unpaired noisy images, making it a more practical and efficient solution. The framework's ability to preserve local details and avoid pixel discontinuities makes it a highly effective solution for denoising real-world images. This can have significant implications for various applications, such as medical imaging, surveillance, and photography.

2

Analysis Plug-and-Play Methods for Imaging Inverse Problems

By Edward P. Chandler, Shirin Shoushtari, Brendt Wohlberg et al. (4 authors)

Computer Vision & MultiModal AI 2025-09-18

Problem

Imaging inverse problems are a type of problem where we try to estimate an image from a set of noisy measurements. This is a common challenge in fields like medical imaging, astronomy, and surveillance. The problem is that traditional methods for solving these problems can introduce reconstruction artifacts, such as staircasing, which can make the reconstructed image look unnatural.

Analogy

Think of the image as a puzzle with many pieces. Traditional methods for solving imaging inverse problems try to fit the pieces together based on a simple set of rules, which can lead to unnatural-looking reconstructions. The researchers in this paper propose a new way of fitting the pieces together by using a learned prior on the gradient domain. This is like having a more sophisticated set of rules that can capture the complex patterns and structures in the image, leading to more accurate and detailed reconstructions.

Key Innovation

The researchers in this paper propose a new approach to solving imaging inverse problems using a technique called Plug-and-Play Priors (PnP). Specifically, they train a denoiser to operate in the gradient domain, rather than on the image itself. This is an extension of traditional total variation (TV) regularization to learned TV regularization. They develop two analysis PnP algorithms, called APnP-HQS and APnP-ADMM, which incorporate this gradient-domain prior in image reconstruction algorithms.

Practical Impact

This research has the potential to improve image reconstruction in a variety of fields, including medical imaging, astronomy, and surveillance. By using a learned prior on the gradient domain, the researchers demonstrate that their approach can achieve performance comparable to traditional PnP algorithms. This could lead to more accurate and detailed images, which could have significant practical impacts in fields like healthcare and environmental monitoring.

Explainable & Ethical AI

Transparency, fairness, and responsible AI development

1

Beyond Pointwise Scores: Decomposed Criteria-Based Evaluation of LLM Responses

By Fangyi Yu, Nabeel Seedat, Dasha Herrmannova et al. (5 authors)

Explainable & Ethical AI 2025-09-19

Problem

Evaluating long-form answers in high-stakes domains such as law, medicine, and finance remains a fundamental challenge. Current evaluation metrics like BLEU and ROUGE fail to capture semantic correctness, and LLM-based evaluators often reduce nuanced aspects of answer quality into a single undifferentiated score. This can lead to inaccurate assessments and real consequences, including tangible harm, legal liability, and erosion of trust in AI systems.

Analogy

Imagine evaluating a lawyer's response to a complex legal question. A standard evaluation metric might give a single score, but DeCE breaks down the evaluation into two dimensions: precision (how accurate and relevant is the answer?) and recall (how well does the answer cover the required concepts?). This decomposition provides a more nuanced understanding of the answer's quality and helps identify areas for improvement.

Key Innovation

Researchers introduce DeCE, a decomposed LLM evaluation framework that separates precision (factual accuracy and relevance) and recall (coverage of required concepts). DeCE is model-agnostic and domain-general, requiring no predefined taxonomies or handcrafted rubrics. It automatically extracts instance-specific, domain-aware criteria from gold-standard answers to perform a structured precision-recall decomposition.

Practical Impact

DeCE offers an interpretable and actionable LLM evaluation framework in expert domains. It achieves substantially stronger correlation with expert judgments (r=0.78) compared to traditional metrics, point-wise LLM scoring, and modern multidimensional evaluators. DeCE's scalability is also demonstrated, with only 11.95% of LLM-generated criteria requiring expert revision. This framework can be applied to various high-stakes domains, such as law, medicine, and finance, to improve the accuracy and reliability of LLM evaluations.

2

Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph

By Gautam Kamath, Alireza F. Pour, Matthew Regehr et al. (4 authors)

Explainable & Ethical AI 2025-09-19

Problem

The main problem this paper addresses is hypothesis selection under local differential privacy constraints. This means that we want to find the distribution in a set of possible distributions that is closest to the actual distribution, while also ensuring that our method doesn't leak too much information about individual data points.

Analogy

Think of hypothesis selection like trying to find the best recipe for a cake. You have a set of possible ingredients (distributions) and you want to find the one that produces the closest match to the actual cake (the true distribution). The Scheffé graph is like a map of the relationships between the ingredients, showing which ones are similar or different. By using this map, we can navigate the space of possible distributions and find the best one, while also ensuring that we don't reveal too much about the individual ingredients (data points).

Key Innovation

The key innovation of this paper is the introduction of a new object called the Scheffé graph, which captures the structure of the differences between distributions in the set Q. This allows the authors to develop a new algorithm that performs fewer queries to individuals who have samples from a probability distribution p, while still ensuring local differential privacy.

Practical Impact

This research has important practical implications for data analysis and machine learning. By developing an algorithm that can select the best hypothesis under local differential privacy constraints, we can ensure that our methods are more private and secure. This is particularly important in applications where data is sensitive or regulated, such as in healthcare or finance.

3

Where Do I 'Add the Egg'?: Exploring Agency and Ownership in AI Creative Co-Writing Systems

By Dashiel Carrera, Jeb Thomas-Mitchell, Daniel Wigdor

Explainable & Ethical AI 2025-09-18

Problem

The main problem this paper addresses is the mixed perception of AI creative co-writing systems among writers. While AI has opened new artistic possibilities, many writers are skeptical about whether AI can yield creative results or disrupt their creative integrity. To support broader adoption of AI co-writing systems, it's crucial to consider the factors that influence writers' long-term satisfaction with these systems.

Analogy

Imagine writing a story with a co-author who has a different style and voice. You want to feel in control of the narrative, but you also want to collaborate and create something new. The interface metaphors used in AI co-writing systems can either enhance or undermine this sense of collaboration and control. The authors' research shows that different metaphors can lead to different perceptions of agency and ownership, and that designers can use these insights to create more effective and satisfying AI co-writing systems.

Key Innovation

This research introduces a new approach to understanding agency and ownership in AI creative co-writing systems. The authors use interface metaphors as a design probe to explore how writers conceptualize agency and ownership during interactions with AI co-writing systems. They identify three interface metaphor archetypes (tool-like, agentic, and magic) and create prototypes to study how these metaphors affect writers' perceptions and behaviors.

Practical Impact

This research has significant practical implications for the design of AI co-writing systems. By understanding how interface metaphors shape writers' perceptions of agency and ownership, designers can create more effective, satisfying, and considered AI co-writing systems. This can lead to increased adoption and satisfaction among writers, enabling them to explore new artistic possibilities and collaborate more effectively with AI.