Optimize LLM Inference: Balance the use of internal knowledge and tools with SMART

by admin · February 24, 2025

Recent advances in LLMS have greatly improved their inference capabilities, enabling them to perform text composition, code generation, and logical inference tasks. However, these models often experience difficulties in balancing their internal knowledge with the use of external tools, resulting in overuse of tools. This happens when LLMs unnecessarily rely on external tools to handle tasks whose parameter knowledge can handle, increasing computational costs, and sometimes degrading performance. Research shows that LLMS calls tools over 30% of the time (even if unnecessary) emphasizes self-awareness of the boundaries of knowledge. Addressing this problem requires a better calibration mechanism to enable LLM-driven agents to determine when to rely on their knowledge with external resources, ultimately improving efficiency, scalability and user experience.

Research on the knowledge boundaries of LLM shows that while these models can perform well on structured tasks, they often fail to identify their limitations, leading to hallucinations or misuse of tools. Efforts to address these challenges include searching for achievements, confidence calibration and clear knowledge boundaries training. Similarly, research on tool integration also explores adaptive tool usage based on internal uncertainty, external module integration, and dynamic call strategies. Despite these advances, existing benchmarks show that LLM strives to determine the necessity and appropriateness of tool use.

Inspired by human metacognition, researchers at Urbana-Champaign and IBM Research AI at the University of Illinois have developed Smart (Strategic Model Perceptual Inference with Tools) to enhance LLMS self-awareness and optimize tool usage. They introduce Smart-ER, a dataset that spans the domains of mathematics, time and intent, and uses clear reasons to balance internal reasoning with external tools. Using this dataset, SmartAgent was trained to reduce overuse of the tool by 24% while improving performance by 37% so that smaller models can match GPT-4 and 70B models. SmartAgent also summarizes the distribution tasks well, demonstrating more confident decision-making and effective tool dependencies.

SMART enhances proxy metacognition by balancing internal knowledge with external tools to alleviate overuse of tools. The datasets Smart-ER spanning the domains of mathematics, time and intent help models distinguish between knowledge-driven and tool-dependent reasoning. The query is broken down into structured steps and determines when tools are needed. The reasoning chain combines the reasons for perfecting decisions and improves explanatory nature. SmartAgent is trained by smart devices, such as Llama-3.1 and Mistral, and can be used with optimization tools while maintaining accuracy. This approach allows dynamic, context-aware reasoning, thereby reducing dependence on external tools while improving overall performance and decision-making confidence in language models.

This study proposes experiments that demonstrate the effectiveness of Smartagent in reducing overuse of tools while improving inference performance. Evaluation was performed on intradomain (Math, FreshQA, IN3) and out-of-distribution (GSM8K, MINTQA) datasets, and SmartAgent was compared with various baselines. It can reduce tool dependencies by 24%, while achieving a 37% performance boost. It is worth noting that the SmartAgent models at scales of 7B and 8B are better than GPT-4O in some tasks. The results highlight their effective tool usage, generalization features and best decisions. Error analysis shows that intelligent intelligence minimizes redundant tool calls, thereby improving inference efficiency. A case study reveals its logical approach and metacognitive reasoning, thus making its response more obvious and effective.

In short, the analysis highlights a key issue: Even if internal knowledge is sufficient, agents often overuse external tools, which may be due to uncertainty in their functions or the convenience of external queries. In contrast, large models such as GPT-4O sometimes use tools to misjudgment task complexity. Addressing these inefficiencies may involve resource constraints or adaptive mechanisms. Inspired by human decision-making, when agents rely on tools and parameter knowledge, intelligent paradigms can improve reasoning. Data-driven calibration methods can improve self-awareness and thus reduce unnecessary tool use. Future work can further explore confidence detection, self-examination modules and metacognitive learning to optimize decision efficiency.

Check Paper and github pages. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 80k+ ml subcolumn count.

Recommended Reading – LG AI Research Unleashes Nexus: An Advanced System Integration Agent AI Systems and Data Compliance Standards to Address Legal Issues in AI Datasets