SimScale: Learning to Drive via Real-World Simulation at Scale

Agentic AI
Published: arXiv: 2511.23369v1
Authors

Haochen Tian Tianyu Li Haochen Liu Jiazhi Yang Yihang Qiu Guang Li Junli Wang Yinfeng Gao Zhang Zhang Liang Wang Hangjun Ye Tieniu Tan Long Chen Hongyang Li

Abstract

Achieving fully autonomous driving systems requires learning rational decisions in a wide span of scenarios, including safety-critical and out-of-distribution ones. However, such cases are underrepresented in real-world corpus collected by human experts. To complement for the lack of data diversity, we introduce a novel and scalable simulation framework capable of synthesizing massive unseen states upon existing driving logs. Our pipeline utilizes advanced neural rendering with a reactive environment to generate high-fidelity multi-view observations controlled by the perturbed ego trajectory. Furthermore, we develop a pseudo-expert trajectory generation mechanism for these newly simulated states to provide action supervision. Upon the synthesized data, we find that a simple co-training strategy on both real-world and simulated samples can lead to significant improvements in both robustness and generalization for various planning methods on challenging real-world benchmarks, up to +6.8 EPDMS on navhard and +2.9 on navtest. More importantly, such policy improvement scales smoothly by increasing simulation data only, even without extra real-world data streaming in. We further reveal several crucial findings of such a sim-real learning system, which we term SimScale, including the design of pseudo-experts and the scaling properties for different policy architectures. Our simulation data and code would be released.

Paper Summary

Problem
Autonomous driving systems require learning to make rational decisions in a wide range of scenarios, including safety-critical and out-of-distribution ones. However, these cases are underrepresented in real-world data collected by human experts, making it challenging to train planners that can generalize to rare or unseen situations.
Key Innovation
The researchers introduce a novel and scalable simulation framework called SimScale, which generates massive unseen states upon existing driving logs. This framework uses advanced neural rendering with a reactive environment to produce high-fidelity multi-view observations controlled by the perturbed ego trajectory. Additionally, they develop a pseudo-expert trajectory generation mechanism to provide action supervision for these newly simulated states.
Practical Impact
SimScale can be applied in the real world by allowing autonomous driving systems to learn from a wider range of scenarios, including safety-critical and out-of-distribution ones. This can lead to more robust and generalizable planners that can handle rare or unseen situations. The researchers demonstrate that a simple co-training strategy on both real-world and simulated samples can lead to significant improvements in robustness and generalization for various planning methods on challenging real-world benchmarks.
Analogy / Intuitive Explanation
Imagine you're training a driverless car to navigate through a busy city. The car is initially trained on a small dataset of real-world driving scenarios, but it struggles to handle unexpected situations like a pedestrian stepping into the road. SimScale is like a simulator that generates a vast number of hypothetical scenarios, including rare and safety-critical ones, allowing the car to learn from these experiences and become more robust and generalizable. This is like a "what-if" scenario training, where the car is prepared for a wide range of possibilities, making it safer and more reliable on the road.
Paper Information
Categories:
cs.CV cs.RO
Published Date:

arXiv ID:

2511.23369v1

Quick Actions