Microsoft researchers introduce artists: An enhanced learning framework that uses LLMS with proxy reasoning and dynamic tools

LLM has made impressive gains in complex reasoning, mainly through innovations in architectural, scale and training methods such as RL. RL enhances LLM by using reward signals to guide the model to adopt more effective inference strategies, resulting in longer, more coherent thinking processes that dynamically adapt to task complexity. Nevertheless, most RL-enhanced LLMs rely heavily on static internal knowledge and text-only reasoning, which makes them unsuitable for tasks that require real-time information, domain-specific expertise, or precise calculations. This limitation is particularly evident in knowledge-intensive or open-ended questions where inaccuracy or hallucination is caused by inaccuracy or hallucination.
To overcome these limitations, recent work explores proxy reasoning, in which LLMS dynamically interacts with external tools and environments during the reasoning process. These tools include web search, API, and code execution platforms, while the environment ranges from simulated browsers to operating systems. Agent reasoning enables models to plan interactively, adapt, and solve tasks beyond static reasoning. However, current tool integration approaches often depend on tips or supervised fine-tuning of manual design, which hinders scalability and generalization. Emerging enhanced learning technologies such as Group Relative Policy Optimization (GRPO) provide more efficient and adaptive training for tool use without ladder supervision. However, the intersection of RL, tool usage and proxy decisions is still under-enriched, especially in real-life tasks that require multi-turn reasoning, dynamic planning, and robust external interactions.
Microsoft Research introduces artists (Proxy Inference and Tool Integration in Self-improvement Transformers), a framework that combines proxy inference, reinforcement learning, and dynamic tools to enhance LLMS. Artists enable models to decide independently when, how and what tools to use during multi-step reasoning, learn powerful strategies without ladder supervision. The model improves inference and interaction with the external environment through integrated tool query and output. Artists outperform top models like GPT-4O, earning up to 22% when it comes to evaluating challenging mathematical and functional call benchmarks. It demonstrates emerging agency behavior, setting new standards with new standards that can be generalized and interpretable problem solving.
Artists are a flexible framework that allows LLM to interact with external tools and environments using enhanced learning. It alternates between reasoning and tool usage, allowing the model to choose when and how tools such as code interpreters or APIs are called. Training uses GRPO, which avoids value features and uses results-based group rewards. Artist structure promotes reasoning, tool query, tool output and final answers, and encourages correctness, correct formats and successful tool use through a composite reward system, enabling adaptability, multi-step problem solving.
Artists outperform various benchmarks on complex mathematical benchmarks such as AMC, AIME, and Olympiad, including GPT-4O and tool-enhanced LLM. It achieves a high pass@1 accuracy, and compared to other tool-integrated methods, the base model is up to 22% and over 35%. The advantage of an artist comes from their agent enhancement learning, enabling them to use external tools and strategically refine multi-step solutions. It shows excellent tool calls, response quality and inference depth compared to rapid tool use. Although its benefits are most evident in complex tasks, artists can significantly improve simpler datasets through the use of selective tools, such as the Math-500.
In short, the artist is a framework that combines proxy reasoning, enhanced learning and dynamic tools to enhance the functionality of LLM. Unlike traditional time-based approaches, artists can adapt and solve complex tasks by interacting with external tools and environments. It learns effective tool usage strategies without step-by-step supervision, improves accuracy and deeper reasoning. Evaluation of mathematical and functional call benchmarks showed significant performance improvements. Artists also produce more easily interpretable reasoning paths and robust behaviors. This work highlights the potential of proxy RL as a promising direction for creating more adaptable and capable AI systems.
Check Paper. Also, don’t forget to follow us twitter.
Here is a brief overview of what we built in Marktechpost:

Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.