PACE: Two-Timescale Self-Evolution for Small Language Model Agents

AI in healthcare

Published: arXiv: 2605.23019v1

Authors

Chen Ling Pei Chen Albert Guan Jiaming Qu Shayan Ali Akbar Madhu Gopinathan Erwin Cornejo

Abstract

Deploying language-model agents in production often requires substantial compute and human effort to tune prompts, parsers, validators, and other components of the agent pipeline. Self-evolution offers a promising alternative, but most existing frameworks assume access to frontier models that can reliably diagnose failures, propose revisions, and judge their own updates. We study whether frozen small language models (SLMs) can serve as effective self-evolving agents under resource constraints. We propose PACE (Prompt And Control Logic Evolution), a two-timescale framework that coordinates low-risk prompt refinement with higher-risk control-logic updates. PACE evolves prompts under fixed control logic until prompt-level gains saturate, then considers constrained control-logic updates that are accepted through held-out validation. Across three frozen SLM backbones ranging from 4B to 14B parameters and four controlled benchmarks, PACE achieves the best performance on all 12 backbone--benchmark combinations, improving over vanilla SLM agents by up to +9.2% relative improvement and over the stronger single-mode evolution baseline by up to +5.4% relative improvement. A tau-bench case study further shows that PACE improves multi-turn tool-use success over vanilla and prompt-only evolution. These results suggest that reliable SLM agent self-evolution is possible without updating model weights or relying on frontier-model teachers, and that the key benefit is not any single final solver pattern but autonomous, validated discovery of task-appropriate inference strategies.

Paper Summary

Problem

Deploying language-model agents in production is a challenging task. It requires substantial compute and human effort to tune prompts, parsers, validators, and other components of the agent pipeline. This can be time-consuming and expensive. Current approaches to agent self-evolution assume access to strong frontier models that can diagnose failures, propose revisions, and judge whether those revisions should be accepted. However, this is not feasible when the agent is powered by a small language model (SLM).

Key Innovation

The researchers propose a two-timescale framework called PACE (Prompt And Control Logic Evolution) that coordinates low-risk prompt refinement with higher-risk control-logic updates. PACE evolves prompts under fixed control logic until prompt-level gains saturate, then considers constrained control-logic updates that are accepted through held-out validation. This approach enables a frozen SLM to autonomously discover, select, and validate task-appropriate inference strategies.

Practical Impact

The results of this research show that PACE achieves the best performance on all 12 backbone-benchmark combinations, improving over vanilla SLM agents by up to +9.2% relative improvement and over the stronger single-mode evolution baseline by up to +5.4% relative improvement. This means that PACE can improve the performance of language-model agents without requiring access to strong frontier models or updating model weights. This has significant practical implications for the deployment of language-model agents in production.

Analogy / Intuitive Explanation

Imagine you're trying to solve a puzzle, and you have a set of possible solutions. PACE is like a two-step process: first, you try different puzzle pieces (prompts) to see if they fit, and once you find a good fit, you try different ways to assemble the puzzle (control logic) to see if it improves the overall solution. PACE is efficient because it only accepts changes that improve the solution, and it does so in a way that's controlled and validated. This analogy captures the essence of PACE's two-timescale framework and its ability to improve the performance of language-model agents.

Paper Information

Categories:

cs.LG

Published Date:

arXiv ID:

2605.23019v1

Quick Actions

Back to Home