Hesgoal || TOTALSPORTEK|| F1 STREAMS || SOCCER STREAMS

Alibaba AI team has just released OVIS 2.5 multi-mode LLM: the main leap of open source AI with enhanced visual perception and reasoning capabilities

OVIS2.5 is the latest large multimodal model (MLLM) from Alibaba’s AIDC-AI team, which has caused a sensation in the open source AI community with its 9B and 2B parameter variants. OVIS2.5 sets new benchmarks for performance and efficiency by introducing technological advances in natural resolution visual perception, deep multimodal reasoning and powerful OCR to address the long-term limitations most MLLMs face in handling high-detailed information.

Vision of local resolution and profound reasoning

The innovation defined in OVIS2.5 is its integration with a local resolution vision transformer (NAVIT) that processes images at its original variable resolution. Unlike previous models that rely on tiles or forced resize, which often lead to important global environments and details, Navit retains the full integrity of complex charts and natural images. This upgrade allows the model to perform well in visually intensive tasks ranging from scientific graphs to complex infographics and forms.

To address the challenge of reasoning, OVIS2.5 implements a chain-three (COT)-supervised course beyond the standard. Its training data includes a sample of “thinking style” for self-correction and reflection, reaching its culmination with an optional “thinking pattern” when reasoning. Users can enable this mode (as discussed enthusiastically in Locallama Reddit threads) to exchange faster response times for enhanced step-by-step accuracy and model introspection. This is particularly beneficial for tasks that require deeper multimodal analysis, such as scientific question answering or mathematical problem solving.

Performance benchmarks and state-of-the-art results

OVIS2.5-9B scored an average score of 78.3 on the Opencompass multimodal ranking list, leading it to all open source MLLMs below 40B; OVIS2.5-2B scored 73.9, setting a new standard for lightweight models, ideal for device or resource constraint reasoning. Both models offer outstanding results in the specialized field, leading open source competitors:

  • STEM reasoning (Mathvista, Mmmu, Wemath)
  • OCR and Chart Analysis (Ocrbench V2, ChartQA Pro)
  • Visual grounding (reccoco, reccocog)
  • Video and multi-image understanding (blink, video) ovis2_5_tech_report.pdfx

Technical comments on Reddit and X highlight significant advances in OCR and document processing, with users noting that text extraction is proposed in confusing images, powerful formal understanding, and flexible support for complex visual queries.

Efficient training and scalable deployment

OVIS2.5 optimizes end-to-end training efficiency by adopting multi-modal data packaging and advanced hybrid parallelism, with an acceleration of up to 3-4 times overall throughput. Its lightweight 2B variant continues the series’ “small model, big performance” philosophy, enabling high-quality multimodal understanding of mobile hardware and edge devices.

Alibaba’s newly released OVIS2.5 models (9B and 2B) mark a breakthrough in open source multi-mode AI, with the most advanced scores on the OpenCompass rankings for models below 40B parameters. Key innovations include a natural resolution visual transformer that skillfully handles high-detailed visual effects without paving, and optional “thinking modes” that enable deeper self-reflective reasoning for complex tasks. OVIS2.5 performs well in STEM, OCR, chart analysis and video comprehension, outperforms previous open models and closes the gap in proprietary AI. Its efficiency-centric training and lightweight 2B variants give advanced multimodal features both for researchers and resource-constrained applications.


Check Technical papers and models that embrace faces. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.


Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

You may also like...