Grounded Test-Time Adaptation for LLM Agents

Agentic AI
Published: arXiv: 2511.04847v1
Authors

Arthur Chen Zuxin Liu Jianguo Zhang Akshara Prabhakar Zhiwei Liu Shelby Heinecke Silvio Savarese Victor Zhong Caiming Xiong

Abstract

Large language model (LLM)-based agents struggle to generalize to novel and complex environments, such as unseen websites or new sets of functions, due to a fundamental mismatch between their pre-training and test-time conditions. This challenge stems from two distinct failure modes: a syntactic misunderstanding of environment-specific components like observation formats, and a semantic misunderstanding of state-transition dynamics, which are only revealed at test time. To address these issues, we propose two distinct and complementary strategies for adapting LLM agents by leveraging environment-specific information available during deployment. First, an online distributional adaptation method parameterizes environmental nuances by learning a lightweight adaptation vector that biases the model's output distribution, enabling rapid alignment with an environment response format. Second, a deployment-time dynamics grounding method employs a persona-driven exploration phase to systematically probe and learn the environment's causal dynamics before task execution, equipping the agent with a nonparametric world model. We evaluate these strategies across diverse agentic benchmarks, including function calling and web navigation. Our empirical results show the effectiveness of both strategies across all benchmarks with minimal computational cost. We find that dynamics grounding is particularly effective in complex environments where unpredictable dynamics pose a major obstacle, demonstrating a robust path toward more generalizable and capable LLM-based agents. For example, on the WebArena multi-site split, this method increases the agent's success rate from 2% to 23%.

Paper Summary

Problem
Large language model (LLM)-based agents struggle to generalize to new and complex environments, such as unseen websites or new functions, due to a fundamental mismatch between their pre-training and test-time conditions. This challenge arises from two distinct failure modes: a syntactic misunderstanding of environment-specific components and a semantic misunderstanding of state-transition dynamics, which are only revealed at test time.
Key Innovation
The researchers propose two distinct and complementary strategies for adapting LLM agents to novel environments. The first strategy, parametric test-time adaptation, involves learning a lightweight adaptation vector that biases the model's output distribution to align with the environment's syntax. The second strategy, non-parametric test-time adaptation, employs a persona-driven exploration phase to systematically probe and learn the environment's causal dynamics before task execution.
Practical Impact
These strategies can be applied in real-world scenarios where LLM-based agents are deployed in novel environments, such as web navigation or function calling. By adapting to the environment's syntax and dynamics, the agents can improve their performance and achieve higher success rates. For example, the non-parametric test-time adaptation strategy increased the agent's success rate from 2% to 23% on the WebArena multi-site split.
Analogy / Intuitive Explanation
Imagine you're trying to order food at a new restaurant, but the menu is unfamiliar. A parametric test-time adaptation is like a personal assistant who learns the restaurant's menu layout and suggests the correct ordering process. A non-parametric test-time adaptation is like a curious friend who explores the restaurant, learns the menu items, and explains the ordering process to you. Both strategies help you navigate the unfamiliar environment and achieve your goal.
Paper Information
Categories:
cs.LG
Published Date:

arXiv ID:

2511.04847v1

Quick Actions