Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation

Generative AI & LLMs
Published: arXiv: 2509.05226v1
Authors

Abdul Waheed Chancharik Mitra Laurie Z. Wang Deva Ramanan Bhiksha Raj

Abstract

Chain-of-thought reasoning, while powerful, can produce unnecessarily verbose output for simpler problems. We present a framework for difficulty-aware reasoning that teaches models to dynamically adjust reasoning depth based on problem complexity. Remarkably, we show that models can be endowed with such dynamic inference pathways without any architectural modifications; we simply post-train on data that is carefully curated to include chain-of-thought traces that are proportional in length to problem difficulty. Our analysis reveals that post-training via supervised fine-tuning (SFT) primarily captures patterns like reasoning length and format, while direct preference optimization (DPO) preserves reasoning accuracy, with their combination reducing length and maintaining or improving performance. Both quantitative metrics and qualitative assessments confirm that models can learn to "think proportionally", reasoning minimally on simple problems while maintaining depth for complex ones.

Paper Summary

Problem
The main problem addressed by this research is that current chain-of-thought (CoT) prompting methods for large language models (LLMs) produce unnecessarily verbose reasoning outputs even for simple math problems. This inefficiency leads to increased latency and computational cost, which can have significant environmental impacts.
Key Innovation
What's new about this work is the introduction of a framework for difficulty-aware chain-of-thought distillation that teaches models to dynamically adjust their reasoning depth based on problem complexity. This approach allows models to learn to "think proportionally" – reasoning minimally on simple problems while maintaining depth for complex ones.
Practical Impact
This research has significant practical implications for the development of more efficient and accurate language models. By training models to adapt their reasoning verbosity based on problem difficulty, this approach can reduce unnecessary computation and latency, making it more suitable for real-world applications where efficiency is crucial. Additionally, this work demonstrates that models can be trained to produce concise yet accurate reasoning, which can improve human-computer interaction and decision-making processes.
Analogy / Intuitive Explanation
Think of this research as teaching a model to adjust its "thinking pace" based on the complexity of the problem. Just as humans tend to allocate more cognitive effort for complex tasks and less effort for simple ones, this approach trains models to do the same – producing concise reasoning for simple problems and maintaining depth for complex ones. This flexibility can lead to more efficient and accurate language processing capabilities.
Paper Information
Categories:
cs.CL
Published Date:

arXiv ID:

2509.05226v1

Quick Actions