Minimax AI release minimum – MINIMAX-M1:456B parameter hybrid model for long text and reinforcement learning RL tasks

The Challenge of Long-Text Cultural Inference in AI Models
Large inference models are designed not only to understand language, but also to have structural thinking that can be thought through multi-step processes that require extended attention span and contextual understanding. As AI expectations grow, especially in real-world and software development environments, researchers seek architectures that can handle longer inputs and maintain deep, coherent chains of reasoning without overwhelming computational costs.
Computational limitations of traditional transformers
The main difficulty in scaling these inference capabilities is the overcomputation load with longer generation lengths. The traditional transformer-based model uses a soft Max attention mechanism, which scales four scales with input size. This limits their ability to effectively process long input sequences or extended chains of ideas. This issue becomes even more urgent in areas where real-time interactions are required or costly to reasoning.
Existing alternatives and their limitations
Efforts to address this problem have resulted in a range of methods, including sparse attention and linear attention variants. Some teams have tried state spatial models and recurring networks as alternatives to traditional attention structures. However, these innovations have limited adoption in the most competitive inference model due to building complexity or lack of scalability in actual deployments. Even the Tencent Hunyuan-T1 using the novel Mamba architecture can limit wider research participation and validation.
Introduction to minimax-m1: Extensible open weight model
Researchers at Minimax AI have launched the Minimax-M1, a new open weight, large-scale reasoning model that combines a mixture of expert architecture with lightning-fast attention. Minimax-M1 is an evolutionary construction as the minimum TEXT-01 model, which contains 456 billion parameters and 45.9 billion each token is activated. It supports context lengths of up to 1 million tokens, which is eight times that of DeepSeek R1. This model solves the computational scalability of inference time, consuming only 25% of the slippers required for DeepSeek R1, at a generation length of 100,000 tokens. It is a training to use large-scale augmented learning on a wide range of tasks from mathematics and coding to software engineering, marking a shift to practical, long-form cultural AI models.
Hybrid attention, lightning attention and soft magnetic blocks
To optimize this architecture, the Minimax-M1 adopts a hybrid attention scheme, with each seventh transformer block using the traditional soft Max attention and then using the six blocks of lightning attention. This greatly reduces computational complexity while maintaining performance. Lightning’s attention itself is I/O-Aware, which is based on linear attention and is particularly effective in extending inference lengths to hundreds of thousands of tokens. To enhance learning efficiency, the researchers introduced a new algorithm called Cispo. Cispo clipping does not cut token updates like traditional methods, but rather an important sampling weight, enabling stable training and consistent token contributions, even in non-policy updates.
CISPO algorithm and RL training efficiency
Prove that CISPO algorithms are crucial to overcome the training instability faced in hybrid architectures. In a comparative study using QWEN2.5-32B baseline, Cispo achieved a 2x speed compared to DAPO. Taking advantage of this, the Minimax-M1’s full enhanced learning cycle was completed in just three weeks with a rental of approximately $534,700. The model is trained on a different dataset including 41 logical tasks generated by the Synlogic framework and the real-world software engineering environment. These environments utilize execution-based rewards to guide performance, resulting in stronger results in actual coding tasks.
Benchmark results and comparison performance
The Minimax-M1 provides compelling benchmark results. Compared to the DeepSeek-R1 and Qwen3-235B, it performs excellently in software engineering, long text processing, and proxy tool use. Although it lags behind the latest DeepSeek-R1-0528 in math and coding competitions, it surpasses the Openai O3 and Claude 4 Opus in its long-standing benchmark. In addition, it outperforms the Gemini 2.5 Pro usage evaluation in tau basic proxy tools.
Conclusion: A scalable and transparent model of long-form cultural AI
The Minimax-M1 takes an important step forward by providing transparency and scalability. By solving the dual challenges of reasoning efficiency and training complexity, Minimax AI’s research team sets a precedent for the open weight reasoning model. This work not only brings solutions to computational constraints, but also introduces practical ways to extend language model intelligence into real-world applications.
Check Paper, Models and GitHub Pages. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Nikhil is an intern consultant at Marktechpost. He is studying for a comprehensive material degree in integrated materials at the Haragpur Indian Technical College. Nikhil is an AI/ML enthusiast and has been studying applications in fields such as biomaterials and biomedical sciences. He has a strong background in materials science, and he is exploring new advancements and creating opportunities for contribution.
