IBM AI releases Granite-Docling-258M: Open Source, Enterprise Ready Documentation AI Model

by admin · September 18, 2025

IBM has released Granite 2.58man open source (Apache-2.0) visual model designed specifically for end-to-end document conversion. The model targets layout-faith extraction (panels, codes, equations, lists, subtitles and reading order) in structured, machine-readable representations rather than loss-making prices. It can be used on the face in live demonstrations of Apple Silicon and MLX builds.

What new features are there compared to the stuffy heat?

Granite-ocling is the successor to the smoldering 256m product. For IBM Granite 165m Language model and upgrade visual encoder to siglip2 (Basic, Patch 16-512) Also retain the IDEFICS3-style connector (Pixel-Shuffle Projector). The resulting model has 258 million parameters and shows consistent improvement in accuracy in layout analysis, full-page OCR, code, equations and tables (see metrics below). IBM also addresses the unstable failure modes observed in the preview model (e.g., duplicate token loops).

Construction and training pipelines

backbone: IDEFICS3 derived stack with Siglip2 visual encoder → pixel shuffle connector → granite 165M LLM.
Training framework: Nanovlm (Lightweight, pure Pytorch VLM training kit).
express: Output doctorThis is a markup for IBM authors designed specifically for clear document structures (elements + coordinates + relationships), and downstream tools are converted to Markdown/HTML/JSON.
calculate: Received IBM training Blue Vela H100 cluster.

Quantitative Improvements (Gramite – Sales 2.58m vs. Obesity 256m Preview)

Evaluate docling-evalLMMS-Eval and task-specific datasets:

layout: Map 0.27 vs. 0.23; F1 0.86 vs. 0.85.
Full page OCR: F1 0.84 vs. 0.80; Lower editing distance.
Code identification: F1 0.988 vs. 0.915; Edit Distance 0.013 vs. 0.114.
Equation recognition: F1 0.968 vs. 0.947.
Table recognition (fintabnet @150dpi): TEDS structure 0.97 vs. 0.82; TEDS has content 0.96 vs. 0.76.
Other benchmarks: mmstar 0.30 vs. 0.17; Ocrbench 500 vs. 338.
Stablize: “More efficient in avoiding infinite loops” (production-oriented fixed).

Multilingual support

The amount of granite has increased experiment support Japanese, Arabic and Chinese. IBM marked it as early as possible; English remains the main goal.

How to change document AI for Doctags path

Conventional OCR-to-marked pipelines can lose structural information and complicate downstream retrieval power generation (RAG). Granite emissions doctor– A compact, LLM-friendly structural syntax – Documents will be converted to Markdown/HTML/JSON. This preserves tables, inline/floating math, code blocks, subtitles and reading order, and has clear coordinates, improving index quality and foundations for rags and analysis.

Inference and integration

Documentation integration (recommended): this docling CLI/SDK will automatically pull the granite-convert and convert PDFS/Office Docs/Images to multiple formats. IBM positions the model as a component in the document pipeline, rather than a general VLM.
Running time: use transformer,,,,, vllm,,,,, onnxand MLX;dedicated MLX The construction is optimized for Apple silicon. The Embracing Face Space provides an interactive demonstration (Zerogpu).
license: Apache-2.0.

Why granite?

For enterprise document AI, small VLM Save the structure Reduce inference costs and pipeline complexity. Granite-Docling replaces multiple single-purpose models (layout, OCR, table, code, equation) with a single component that emits richer intermediate representations, improving downstream retrieval and transformation fidelity. The earnings measured (TED in TEDS, F1 of code/eq) are reduced, thus allowing it to actually upgrade from the smoldock of the production workflow.

Demo

Summary

The Granite-Docling-258M marks a significant advancement in compact, structured document AI. By combining IBM’s granite backbone, Siglip2 Vision encoder and Nanovlm training framework, it delivers enterprise-ready performance on tables, equations, code and multilingual text, while also maintaining lightweight and open-ended under Apache 2.0. Thanks to its smoldering predecessor and seamless integration into the document pipeline, Granite-ocling provides a practical basis for document conversion and rag workflows with measurable benefits, while accuracy and reliability are crucial.

Check Model embracing face and Demo is here. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

🔥[Recommended Read] NVIDIA AI Open Source VIPE (Video Pose Engine): A powerful and universal 3D video annotation tool for spatial AI

IBM AI releases Granite-Docling-258M: Open Source, Enterprise Ready Documentation AI Model

What new features are there compared to the stuffy heat?

Construction and training pipelines

Quantitative Improvements (Gramite – Sales 2.58m vs. Obesity 256m Preview)

Multilingual support

How to change document AI for Doctags path

Inference and integration

Why granite?

Demo

Summary

You may also like...

live chat

Recent Posts

IBM AI releases Granite-Docling-258M: Open Source, Enterprise Ready Documentation AI Model

What new features are there compared to the stuffy heat?

Construction and training pipelines

Quantitative Improvements (Gramite – Sales 2.58m vs. Obesity 256m Preview)

Multilingual support

How to change document AI for Doctags path

Inference and integration

Why granite?

Demo

Summary

You may also like...

How technology releases a wave of casino varieties

Why tech-savvy users can switch to Hezire tablets

Tencent Open Source Hunyuan-A13B: 13B Activity Parameters MOE Model with Dual Mode Inference and 256K Context

live chat

Recent Posts