Allen Institute for Artificial Intelligence (AI2) launches Olmo 3: an open source 7B and 32B LLM series based on Dolma 3 and Dolci Stack
The Allen Institute for Artificial Intelligence (AI2) is releasing Olmo 3 as a fully open model series that exposes the entire “model flow,” from raw data and code to intermediate checkpoints and deployment-ready variants.
Olmo 3 is a dense transformer kit with 7B and 32B parameter models. The series includes Olmo 3-Base, Olmo 3-Think, Olmo 3-Instruct and Olmo 3-RL Zero. The 7B and 32B variants share a context length of 65,536 tokens and use the same staged training recipe.

Dolma 3 Data Suite
The core of the training process is Dolma 3, a new data collection designed for Olmo 3. Dolma 3 consists of Dolma 3 Mix, Dolma 3 Dolmino Mix and Dolma 3 Longmino Mix. Dolma 3 Mix is a 5.9T token pre-training data set, including web page text, scientific PDFs, code repositories and other natural data. The Dolmino and Longmino subsets were constructed from filtered, higher quality slices of this pool.
Dolma 3 Mix supports the main pre-training phase of Olmo 3-Base. The AI2 research team then applied Dolma 3 Dolmino Mix, a mid-term training set of 100B tokens that emphasizes math, coding, instruction following, reading comprehension, and thinking-oriented tasks. Finally, Dolma 3 Longmino Mix adds 50B tokens for the 7B model and 100B tokens for the 32B model, focusing on long documents and scientific PDFs processed using the olmOCR pipeline. This phased course pushes the context limit to 65,536 tokens while maintaining stability and quality.
Large-scale training on H100 cluster
Olmo 3-Base 7B was trained on Dolma 3 Mix using 1,024 H100 devices, achieving approximately 7,700 tokens per second per device. In the later stage, 128 H100s are used for Domino mid-term training, and 256 H100s are used for Long mino long context expansion.
Base model performance relative to open families
On standard capability benchmarks, Olmo 3-Base 32B is positioned as the leading fully open base model. The AI2 research team reports that it is competitive with well-known open weight series such as the similarly sized Qwen 2.5 and Gemma 3. Comparing across a wide range of tasks, Olmo 3-Base 32B ranks close to or higher than these models while keeping the full data and training configuration open for inspection and reuse.
Olmo 3 Think focused on reasoning
The Olmo 3-Think 7B and Olmo 3-Think 32B sit above the base model as inference-focused variants. They use a three-stage post-training scheme that includes supervised fine-tuning, direct preference optimization, and reinforcement learning with verifiable rewards within the OlmoRL framework. Olmo 3-Think 32B is described as the strongest fully open inference model, closing the gap with the Qwen 3 32B thinking model while using approximately six times fewer training tokens.


Olmo 3 Chat and Tool Usage Guide
Olmo 3-Instruct 7B is tuned for fast instruction tracking, multi-turn chat, and tool usage. It starts with Olmo 3-Base 7B, applying a separate Dolci Instruct data and training pipeline, covering supervised fine-tuning, DPO and RLVR for dialog and function call workloads. The AI2 research team reports that Olmo 3-Instruct’s performance is comparable to or better than that of open-weight competitors such as Qwen 2.5, Gemma 3, and Llama 3.1, and is competitive with the similarly sized Qwen 3 series on multiple instruction and inference benchmarks.
RL zero for clean RL studies
Olmo 3-RL Zero 7B is designed for researchers who are concerned with language model reinforcement learning but need to clearly distinguish between pre-training data and RL data. It is a fully open RL path built on top of Olmo 3-Base and uses the Dolci RL null data set purified according to Dolma 3.
comparison table
| Model variants | training or post-training data | Main use cases | Report position comparison with other open models |
|---|---|---|---|
| Olmo 3 Base 7B | Dolma 3 Mix pre-training, Dolma 3 Dolmino Mix mid-training, Dolma 3 Longmino Mix long context | Common basic model, long context reasoning, code, mathematics | Powerful fully open 7B base designed as the basis for Think, Instruct, RL Zero and evaluated against the leading open 7B scale base |
| Olmo 3 Base 32B | Same Dolma 3 grading pipeline as 7B, with 100B Longmino token for long context | High-end foundation for research, long-context workloads, and reinforcement learning settings | Described as the best fully open 32B base, comparable to Qwen 2.5 32B and Gemma 3 27B, and better than Marin, Apertus, LLM360 |
| Olmo 3 Thoughts 7B | Olmo 3 Base 7B, and Dolci Think SFT, Dolci Think DPO, Dolci Think RL in the OlmoRL frame | The 7B model that focuses on reasoning, with traces of internal thinking | A fully open inference model at efficient scale that enables chain-of-thought research and reinforcement learning experiments on modest hardware |
| Olmo 3 Thoughts 32B | Olmo 3 Base 32B, plus the same Dolci Think SFT, DPO, RL tubing | Flagship reasoning model, long thinking traces | Known as the strongest fully open thinking model, it can compete with the Qwen 3 32B thinking model, while the number of tokens trained is reduced by about 6 times |
| Olmo 3 Instruction 7B | Olmo 3 Base 7B, plus Dolci Instruct SFT, Dolci Instruct DPO, Dolci Instruct RL 7B | Follow instructions, chat, function calls, tool usage | Reportedly outperforms Qwen 2.5, Gemma 3, Llama 3, and closes the gap with similarly sized Qwen 3 series |
| Olmo 3 RL Zero 7B | Olmo 3 Base 7B, plus Dolci RLZero math, code, IF, mixed data sets, purified from Dolma 3 | RLVR research on mathematics, coding, instruction following, and mixed tasks | Introduced as a fully open RL path for benchmarking RLVR on top of base models with fully open pre-trained data |
Main points
- End-to-end transparent pipeline: Olmo 3 exposes the complete “model flow” from Dolma 3 data construction, through staged pre-training and post-training, to released checkpoints, evaluation suites and tools, enabling fully repeatable LLM research and fine-grained debugging.
- Dense 7B and 32B models with 65K contexts: This series covers 7B and 32B dense Transformers, all with 65,536 token context windows, trained through a three-stage Dolma 3 course, Dolma 3 Mix for main pre-training, Dolma 3 Dolmino for mid-term training, and Dolma 3 Longmino for long context expansion.
- Powerful open foundation and inference model: Olmo 3 Base 32B is positioned as the top fully open base model in its scale, competing with Qwen 2.5 and Gemma 3, while Olmo 3 Think 32B is described as the strongest fully open thinking model, close to the Qwen 3 32B thinking model, using approximately 6 times fewer training tokens.
- Mission Tuning Instructions and RL Zero Variant: Olmo 3 Instruct 7B uses Dolci Instruct SFT, DPO and RLVR data to enable instruction tracking, multi-round chat and tool usage, and its performance is reported to be comparable to or better than Qwen 2.5, Gemma 3 and Llama 3.1 at similar scales. Olmo 3 RL Zero 7B provides a fully open path to RLVR where the Dolci RLZero dataset has been purified from pre-training data for mathematics, code, instruction following, and general chatting.
Olmo 3 is an unusual version in that it implements openness of the entire stack, Dolma 3 data recipes, staged pre-training, Dolci post-training, RLVR in OlmoRL and evaluation using OLMES and OlmoBaseEval. This reduces ambiguity around data quality, long-context training, and reinforcement learning for inference, and creates a concrete baseline for scaling Olmo 3 Base, Olmo 3 Think, Olmo 3 Instruct, and Olmo 3 RL Zero in controlled experiments. Overall, Olmo 3 sets a rigorous reference point for a transparent research-level LLM pipeline.
Check technical details. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.

Michal Sutter is a data science professional with a master’s degree in data science from the University of Padua. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex data sets into actionable insights.
🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.