Large Multi-Model Model (LMMS) enables the system to interpret images in combinations of ways, answer visual questions and retrieve factual information. Their development significantly improves the functionality of virtual assistants and AI systems used in real-world environments. However, even with a large amount of training data, LMMs often ignore dynamic or evolving information, especially the fact that emerges or exists behind proprietary or security boundaries.
One of the key limitations of current LMMs is that they cannot handle queries that require real-time or rare information. These models often hallucinate answers rather than acknowledging intellectual boundaries or seeking external assistance when faced with previously invisible visual input or emerging facts. This question becomes critical in use cases where accuracy is needed, such as answering questions about current events or specific domain details. These gaps not only damage the reliability of LMMs, but also make them unsuitable for tasks that require fact-proof or updated knowledge.
Various tools attempt to solve this problem by allowing the model to connect with external knowledge sources. Before generating the answer, the search enhanced generation (RAG) gets information from the static database before generating the answer, while the timely search agent interacts with the online resource through script reasoning steps. However, RAGs usually retrieve too much data and assume that all the necessary information is already available. Even though timely engineering agents can search, they cannot learn the best search behavior over time. These limitation preventive methods fully adapt to real-world unpredictability, or support effective interactions in practice.
Researchers at Bondedance and S-LAB at Nanyang Technical University have developed MMSEarch-R1, a novel framework designed to enhance LMM performance through enhanced learning. The study describes a method in which models can not only search but also trained to decide when to search, what to search for, and how to effectively interpret search results. MMSEarch-R1 is the first end-to-end enhanced learning framework that enables LMMs to perform on-demand multi-turn searches in a real-world Internet environment. The system includes tools for image and text searches, each based on model judgment rather than fixed pipeline calls.
The core of this system is Group Relative Policy Optimization (GRPO), a variant of the PPO algorithm. MMSEarch-R1 runs by applying a reward system that facilitates accurate answers and prevents unnecessary searches. The model performs multi-round interactions, assesses whether more information is needed, and selects text or image searches if needed. For example, it uses Serpapi to return the top five matching images or web pages and uses Jina Reader and Qwen3-32b to retrieve and summarize relevant web content. The model is trained to wrap reasoning in a predefined format, helping to build answers, search operations, and retrieve content in interactive rounds.
In the test, mmsearch-r1-7b performed better than other search-type baselines of the same size and almost matched the performance of the larger rag-based 32B model. Most importantly, it accomplishes this while reducing the number of search calls by 30%. This shows that the model not only provides accurate answers, but does it more effectively. The performance of the framework is evaluated by a variety of knowledge-intensive tasks, and the search behaviors learned demonstrate efficiency and reliability. The researchers also constructed and shared a comprehensive dataset, FACTUALVQA (FVQA), which includes search requirements and search-free samples. This balanced dataset is essential for guiding the model to distinguish when external data is needed.
Overall, the study addressed the actual weaknesses of current LMMs by training them to have selective and intentional training when using external searches. Instead of passively retrieving information, Mmsearch-R1 encourages the model to act with intention, thereby improving the quality and efficiency of responses. The solution marks a shift in the way AI systems are designed as a shift in interacting with the world by learning to understand that they don’t know and respond accordingly.
Check Paper and github pages. All credits for this study are to the researchers on the project. If you plan to do a product launch/release, fundraising, or just target the traction of developers, help you achieve that goal effectively.

Nikhil is an intern consultant at Marktechpost. He is studying for a comprehensive material degree in integrated materials at the Haragpur Indian Technical College. Nikhil is an AI/ML enthusiast and has been studying applications in fields such as biomaterials and biomedical sciences. He has a strong background in materials science, and he is exploring new advancements and creating opportunities for contribution.