0

Dualdistill and Agent-R1: How AI combines natural language and tools for superior mathematical problem solving

Existing long-term inference models generate inference trajectories through iterative self-verification and improvement, achieving the latest performance in mathematical inference. However, open source long-core models depend only on natural language inference trajectories, making them computationally expensive and prone to errors without a validation mechanism. Although tool-assisted reasoning provides greater efficiency and reliability for large-scale numerical calculations through an open framework that integrates code interpreters, these proxy approaches are compared to abstract or conceptually complex inference problems.

Dualdistill framework and proxy R1 model

Carnegie Mellon University researchers propose dualdistillThis is a distillation framework that combines two complementary teachers’ trajectories to create a unified student model. The framework develops with a reasoning-oriented teacher and a tool-oriented teacher Agesic-R1,The model can dynamically select the most appropriate strategy for each problem type. Agent – R1 code that performs arithmetic and algorithmic tasks when using natural language inference to solve abstract problems. Dualdistill uses trajectory composition to extract knowledge from two complementary teachers and introduce himself. Additionally, the researchers used OpenHands as the proxy reasoning teacher, while DeepSeek-R1 as the text-based reasoning teacher.

Evaluation and Benchmarks

The proposed method was evaluated in multiple benchmarks deepmath-l and Combination 300 Test all aspects of mathematical reasoning. It is compared with the benchmark DeepSeek-R1-Distill and QWEN-2.5 Teaching. The student model Agesic-R1 shows good performance improvements, thus benefiting from agency and reasoning strategies. It outperforms two similarly sized models, each specializing in tool-assisted (QWEN2.5-7B-Instruct) or pure reasoning (DeepSeek-R1-Distill7b) strategies. By intelligently using inference strategies when needed, proxy-R1 is better than tool-based models while maintaining greater efficiency compared to pure inference models on standard mathematical tasks.

Qualitative analysis and tool usage patterns

Qualitative examples show that Agesic-R1 has a smart tool usage pattern that activates code execution tools 79.2% Calculate demanding combination 300 problems while reducing activation to 52.0% For simpler AMC dataset issues. Agentic-R1 learns to properly call tools through individually supervised fine-tuning without explicit instructions, effectively balancing computational efficiency and inference accuracy.

Robustness to teachers’ imperfections

This framework remains valid even under the guidance of imperfect teachers. For example, an agent teacher can only achieve 48.4% Combinatorics300’s accuracy, but student models from 44.7% arrive 50.9%and performed well in the end.

in conclusion

all in all, dualdistill The framework effectively combines the advantages of natural language reasoning with tool-assisted problem solving by refining complementary knowledge from two professional teacher models into a multifunctional student model, i.e. Agesic-R1. Through trajectory composition and self-validation, Agesic-R1 learns to dynamically select the most appropriate strategy for each problem, balance accuracy and computational efficiency. Evaluation across multiple mathematical reasoning benchmarks shows that agent-R1 is superior to pure reasoning and tool-based models even when learning from imperfect teachers. This work highlights a promising approach to building adaptive AI agents that integrate heterogeneous problem-solving strategies for more powerful and effective reasoning.


Check Paper and github pages. All credits for this study are to the researchers on the project.

Researchers with Nvidia, OpenAI, DeepMind, Meta, Microsoft, JP Morgan Chase, Amgan, Amgan, Aflac, Aflac, Wells Fargo and 100s read AI Dev newsletters and researchers read. [SUBSCRIBE NOW]


Sajjad Ansari is a final year undergraduate student from IIT Kharagpur. As a technology enthusiast, he delves into the practical application of AI, focusing on understanding AI technology and its real-world impact. He aims to express complex AI concepts in a clear and easy way.