Tencent Open Source Hunyuan-A13B: 13B Activity Parameters MOE Model with Dual Mode Inference and 256K Context

Tencent’s Hunyuan team introduced Hunyuan-A13ba new open source large language model A mixture of Experts (MOE) architecture. Although the model consists of 80 billion total parameters, there are only 13 billion activities during the inference period, providing an efficient balance between performance and computational cost. It supports Group Query Attention (GQA),,,,, 256K context length,one Dual mode reasoning framework This switches between fast and slow thinking.
Hunyuan-A13B is designed for efficient deployment and powerful reasoning BFCL-V3,,,,, τ board,,,,, C3 stooland Complex funcbenchusually performs well in tool calls and novel schemes.
Architecture: Sparse MUE with 13B active parameters
Hunyuan-A13B takes this as the core and follows a fine MOE design, including 1 sharing expert and 64 non-sharing expertsand 8 experts activated each pass. Supported by extended experiments, this architecture ensures performance consistency while keeping inference costs low. This model includes 32 layers, purpose Swiglu Activated, with a vocabulary of 128K and integrated GQA to improve storage efficiency during long-term delay inference.
MOE settings and optimization of this model Training Courses: A 20-sentence training stage, followed by rapid annealing and long-term adaptation. The final stage first scales the context window to 32K and then encodes it to a 256K token using NTK-aware location, ensuring stable and stable performance over larger sequence lengths.
Dual-mode reasoning: fast and slow thinking
The outstanding characteristics of Hunyuan-A13B are Dual Mode Chain (COT) ability. It supports two low delays Think quickly Patterns of regular query and more refined Think slowly Multi-step reasoning pattern. These modes are controlled by a simple tag system: /no think
for quick reasoning and /think
Used for reflection and reasoning. This flexibility allows users to adjust computational costs to task complexity.
After training: Strengthened learning with task-specific reward model
Hunyuan-A13B’s post-training pipeline includes Multi-stage supervision fine-tuning (SFT) and Strengthening Learning (RL) In reasoning specific and general tasks. RL stage merger Result-based rewards and Tool-specific feedbackincluding a sandbox execution environment for code and rule-based proxy checks.
During the agent training phase, the team combined various tool usage schemes with planners, inspectors and tool roles, thus producing 20,000 format combination. This strengthens Hunyuan-A13B’s ability to perform real-world workflows such as spreadsheet processing, information search, and structured reasoning.
Evaluation: State-of-the-art proxy performance
Hunyuan-A13b display Strong benchmark results Spanning different NLP tasks:
- exist math,,,,, cmathand GPQAit scores on larger dense and MOE models.
- It exceeds QWEN3-A22B and DeepSeek R1 exist Logical reasoning (BBH: 89.1; Zebralogic: 84.7).
- In encoding, it has 83.9 on MBPP and 69.3 on Multipl-E.
- for Agent Tasks,it has been BFCL-V3 (78.3) and Complex Funcbench (61.2)verify the functions of its tool use.
The understanding of novels is another highlight. exist PenguinsCrollsit scored 87.7-only Gemini 2.5 Pro. exist rulereven in 64K – 128K contextIn context elasticity, the performance of larger models such as Qwen3-A22B and DeepSeek R1 is exceeded.

Inference optimization and deployment
Hunyuan-A13B is fully integrated with popular reasoning frameworks vllm,,,,, sglangand Tensorrt-llm. It supports precise formats, e.g. W16A16,,,,, W8A8and KV cache FP8and such Automatic prefix cache and Block pre-filling. It has achieved success 1981.99 token/sec Throughput over 32 batch inputs (2048 inputs, 14336 output length) makes it practical for real-time applications.
Open source and industry relevance
Hunyuan-A13B is available on Hug and Github and comes with a loose open source license. It is designed for effective research and production use, especially in incubation-sensitive environments and short-case tasks.
By combining MUE scalability,,,,, Agent reasoningand Open source accessibility,Tencent’s Hunyuan-A13B provides a compelling alternative to heavyweight LLMs, allowing for wider experimentation and deployment without sacrificing capabilities.
Check Paper. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.
