Tiny Recursive Model (TRM): Tiny 7M model that outperforms DeepSeek-R1, Gemini 2.5 pro and o3-mini in inference on ARG-AGI 1 and ARC-AGI 2
Can iterative draft revision solvers that repeatedly update latent scratchpads outperform larger autoregressive LLMs on ARC-AGI? Samsung SAIT (Montreal) released Tiny Recursive Model (TRM)β A two-layer, ~7M parameter recursive reasoner that reports 44.6β45% Test accuracy ARC-AGI-1 and 7.8β8% exist ARC-AGI-2surpassing results from larger language models such as DeepSeek-R1, o3-mini-high, and Gemini 2.5 Pro reported in the same public evaluations. TRM also raises puzzle benchmarks Extreme Sudoku(87.4%) and Difficult maze (85.3%) more than before Hierarchical inference model (HRM, 27M parameters)while using fewer parameters and simpler training methods.
What exactly is new?
TRM removes the two-module hierarchy and fixed-point gradient approximation of HRM in favor of single tiny network That Recurse(z) on potential “scratch sheets” and a Current solution embedding(y):
- A single tiny loop core. The two-module hierarchy of the HRM is replaced by a 2-layer network that jointly maintains the current solution of the latent scratchpad π§ z and the embedding π¦ y . Model alternation: think: update π§ β π ( π¦ , π§ ) zβf(x,y,z) π n internal steps; behavior: update π¦ β π ( π¦ , π§ ) yβg(y,z).
- Deeply supervised recursion. Think β Action block is Expand up to 16 times Use deep supervision and learn to stop the head during training (fully expanded at test time). Signals are passed across steps via (y,z)(y,z)(y,z).
- Complete backpropagation through the loop. Different from HRM One-step implicit (fixed-point) gradient Approximation method, TRM Backpropagate all recursive stepsthe team found that this is crucial for generalization.

Architecturally, the best performance settings for ARC/Maze remain self-focused;For the small fixed grid of Sudoku, the research team replaced self-attention with MLP-Mixer style token mixer. a small EMA (Exponential Moving Average) Overweight can stabilize training with limited data. The net depth is effectively created as recursion (For example, T=3, n=6) instead of stacking layers upon layers; in ablation, two floors Generalizes better than deeper variants with the same efficient computation.
Understand the results
- ARC-AGI-1/ARC-AGI-2 (two attempts): TRM-Attn (7M): 44.6%/7.8% and Human Resource Management(27M): 40.3%/5.0%. LL.M. baseline reported by the research team: DeepSeek-R1 (671B) 15.8% / 1.3%, o3-mini high 34.5%/3.0%, Gemini 2.5 Professional Edition 37.0%/4.9%; Larger custom Grok-4 entries are higher (66.7β79.6% / 16β29.4%).
- Sudoku Extreme (9Γ9, 1K training/423K testing): 87.4% Using an attention-free mixer with HRM 55.0%.
- Difficult maze (30Γ30): 85.3% and human resource management 74.5%.




These are direct prediction Train models from scratch on small, heavily augmented datasets instead of just a few hints. ARC remains the standard target; the ARC Prize Foundation tracks the broader leaderboard context and rules (e.g. ARC-AGI-2 grand prize threshold is set privately at 85%).
Why can the 7M model beat the larger LLM on these tasks?
- Make decisions first and modify later, rather than marking them one by one: TRM drafts complete candidate solutions and then refines them by: Potential iteration consistency check For input – Reduce exposure bias from autoregressive decoding of structured outputs.
- Computation is spent on test-time inference rather than parameter counting: The effective depth is generated recursively (Simulation depthβ TΒ·(n+1)Β·number of layers), the researchers showed that it yields better generalization under constant computation than adding layers.
- A stricter inductive bias against grid reasoning: For small fixed grids (such as Sudoku), attention-free blending can reduce excess capacity and improve the bias/variance trade-off; for larger 30Γ30 grids, self-attention is preserved.
Main points
- architecture: A ~7M parameter 2-layer recursive solver, alternating between potential “think” updates π§βπ(π¦,π§)zβf(x,y,z) and “action” refinements π¦βπ(π¦,π§)yβg(y,z), unfolding up to 16 steps, with deep supervision; gradients are propagated via full recursion (no fixed-point/IFT approximation).
- result: Report ~44.6β45% exist ARC-AGI-1 and ~7.8β8% exist ARC-AGI-2 (two attempts)surpassed several larger LLMs cited in research paper comparisons (e.g., Gemini 2.5 Pro, o3-mini-high, DeepSeek-R1) under prescribed evaluation protocols.
- Efficiency/Mode: Show that allocating test-time computation to recursive refinement (via depth of unrolling) can defeat parameter scaling on symbolic geometry tasks, providing compact, from-scratch recipes with publicly released code.
This study demonstrates a two-level recursive solver with ~7M parameters that unfolds up to 16 draft revision cycles with ~6 potential updates per cycle and reports ~45% on ARC-AGI-1 and ~8% on ARC-AGI-2 (two attempts). The research team released the code on GitHub. ARC-AGI is still not solved at scale (the target for ARC-AGI-2 is 85%), so the contribution is a result of architectural efficiency rather than a general inference breakthrough.
Check technical paper and GitHub page. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And donβt forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an AI media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.
π FOLLOW MARKTECHPOST: Add us as your go-to source on Google.