UniManip: General-Purpose Zero-Shot Robotic Manipulation with Agentic Operational Graph

Agentic AI
Published: arXiv: 2602.13086v1
Authors

Haichao Liu Yuanjiang Xue Yuheng Zhou Haoyuan Deng Yinan Liang Lihua Xie Ziwei Wang

Abstract

Achieving general-purpose robotic manipulation requires robots to seamlessly bridge high-level semantic intent with low-level physical interaction in unstructured environments. However, existing approaches falter in zero-shot generalization: end-to-end Vision-Language-Action (VLA) models often lack the precision required for long-horizon tasks, while traditional hierarchical planners suffer from semantic rigidity when facing open-world variations. To address this, we present UniManip, a framework grounded in a Bi-level Agentic Operational Graph (AOG) that unifies semantic reasoning and physical grounding. By coupling a high-level Agentic Layer for task orchestration with a low-level Scene Layer for dynamic state representation, the system continuously aligns abstract planning with geometric constraints, enabling robust zero-shot execution. Unlike static pipelines, UniManip operates as a dynamic agentic loop: it actively instantiates object-centric scene graphs from unstructured perception, parameterizes these representations into collision-free trajectories via a safety-aware local planner, and exploits structured memory to autonomously diagnose and recover from execution failures. Extensive experiments validate the system's robust zero-shot capability on unseen objects and tasks, demonstrating a 22.5% and 25.0% higher success rate compared to state-of-the-art VLA and hierarchical baselines, respectively. Notably, the system enables direct zero-shot transfer from fixed-base setups to mobile manipulation without fine-tuning or reconfiguration. Our open-source project page can be found at https://henryhcliu.github.io/unimanip.

Paper Summary

Problem
The main problem addressed by this research paper is achieving general-purpose robotic manipulation, where robots can seamlessly bridge high-level semantic intent with low-level physical interaction in unstructured environments, without requiring task-specific training or fine-tuning. This is a significant challenge because current systems often fail to generalize to novel objects and layouts, and require a fundamental reasoning ability to continuously perceive, verify, and reflect to realign high-level intent with the unscripted physical world.
Key Innovation
The key innovation of this paper is the UniManip framework, which is a general-purpose robotic manipulation framework that achieves robust zero-shot generalization across diverse tasks, objects, and robot embodiments without task-specific fine-tuning or reconfiguration. UniManip is grounded in a Bi-level Agentic Operational Graph (AOG) that unifies semantic reasoning and physical grounding, enabling sophisticated reasoning and dynamic task decomposition while maintaining synchronization with the physical environment.
Practical Impact
The practical impact of this research is significant, as it enables robots to perform a wide range of tasks in unstructured environments without requiring extensive training or fine-tuning. This has the potential to revolutionize industries such as manufacturing, logistics, and healthcare, where robots are increasingly being used to perform tasks that require flexibility and adaptability. The UniManip framework also has the potential to improve the safety and efficiency of robotic systems, as it enables them to recover from execution failures and adapt to changing environments.
Analogy / Intuitive Explanation
Imagine you are trying to assemble a piece of furniture, but the instructions are incomplete and you need to figure out how to do it on your own. The UniManip framework is like having a smart assistant that can understand the instructions, identify the missing pieces, and adapt to the changing environment to help you assemble the furniture successfully. It's like having a robot that can learn and adapt to new situations, and recover from mistakes, making it an essential tool for any industry that requires robotic manipulation.
Paper Information
Categories:
cs.RO
Published Date:

arXiv ID:

2602.13086v1

Quick Actions