Hug face just released Smollm3This is the latest version of its “SMOL” language model, designed to provide powerful multilingual reasoning in novels using a compact 3B parameter architecture. While most models with high membranes usually exceed 7B parameters, SmollM3 manages to provide state-of-the-art (SOTA) performance with significantly fewer parameters – making it more cost-efficient and deployable on constrained hardware without compromising capabilities such as tool usage, multi-step reasoning, and linguistic diversity, and can be compromised.
Overview of Smollm3
Smollm3 stands out Compact, multilingual and dual-mode long lowercase language models Able to process sequences to 128K token. It received training 11 trillion tokenscompetitively positioning it with models such as Mistral, Llama 2 and Falcon. Despite its size, Smollm3 still achieves unexpectedly powerful tool performance and very little reasoning capability, while with model sizes are more common.
Smollm3 is released in two variants:
Both models are publicly available under the Apache 2.0 license to embrace Face’s model center.
Key Features
1. Long context reasoning (up to 128K tokens)
Smollm3 uses the modified attention mechanism to effectively handle very long contexts – 128,000 tokens. This capability is critical for tasks involving extended documents, logs, or structured records, where context length directly affects understanding and accuracy.
2. Dual mode reasoning
Smollm3-3b support for instruction adjustment Dual mode reasoning:
- Guide and follow Used for chat style and tool survey tasks.
- Multilingual quality inspection and generation Used for tasks in multiple languages.
This forking allows the model to perform well in open generation and structured reasoning, making it suitable for applications from rag pipes to agent workflows.
3. Multilingual features
After training in multilingual corpus, Smollm3 supports six languages: English, French, Spanish, German, Italian and Portuguese. It performs well on benchmarks such as Xquad and MGSM, demonstrating its ability to cross language boundaries with minimal performance drop.
4. Compact and SOTA performance
only 3 billion parametersSmollM3 achieves the same performance as the larger model on multiple downstream tasks (such as Mistral-7b) close to or as the larger model. This is possible through the scale and quality of its training data (11T tokens) and careful building adjustments.
5. Tool usage and structured output
The model exhibits impressive performance on tool name tasks – including timely workflows and structured outputs. It correctly follows the pattern-driven input and output constraints and interfaces well with systems that require deterministic behavior, such as autonomous agents and API-driven environments.
Technical training details
Smollm3 is trained through an internal mixture of face-hugging, which consists of high-quality web content, code, academic papers and multilingual sources. The 11t token training run is performed using a multi-node distributed training strategy of GPU clusters, and optimizations such as Flash Attention V2 are used for effective long-term training. Tokenizer is a 128k-Token sentence model shared in all supported languages.
For long-term support, embrace the face Linear and grouping attention mechanisms This minimizes secondary complexity while maintaining performance. This allows the model to handle context lengths up to 128K during training and inference without memory bottlenecks that encounter intensive transformers on this scale.
this Smollm3-3b The TRLX library of Hugging Face is further trained to align with chat instructions, reasoning tasks, and tool usage demonstrations.
Performance Benchmark
SMOLLM3 performs well on a variety of multilingual and inference benchmarks:
- Xquad (Multi-language quality map): Competing scores in all six supported languages.
- MGSM (Multi-lingual Primary School Mathematics): Better than several larger numbers in zero beat settings.
- ToolQA and MultiHopQA: Shows a strong multi-step reasoning and contextual basis.
- Arc and mmlu: The fields of parity and expertise are highly accurate.
Although it doesn’t outperform the latest 7b and 13b models in every benchmark, the Smollm3’s performance-to-parameter ratio is still the highest in its class.

Use cases and applications
Smollm3 is especially suitable for:
- Low-cost multilingual AI deployment In chatbots, HelpDesk systems and record summary.
- Lightweight rags and search-based systems This benefits from the understanding of the novel.
- Tool Start Agent Compliance and deterministic tool calls are required.
- Edge deployment and private environments Smaller models are required due to hardware or data privacy constraints.
in conclusion
Smollm3 illustrates a new generation of language models with smaller capabilities. It combines multilingual support, long cultural processing and powerful reasoning (all within the 3B parameter footprint), marking an important step in model efficiency and accessibility. Hugging Face release shows that with the right training formula and architectural design, smaller models can still provide powerful performance in complex tasks traditionally reserved for larger LLMs.
Check Smollm3-3b Basics and Smollm3-3b-Instruct. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitterand Youtube And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, indicating its popularity among its audience.