0

Bontedance introduces seed formula: a proof of advanced formal reasoning systems for automatic mathematical theorems

LLM shows significant improvements in mathematical reasoning by extending through natural language, thus achieving performance improvements in benchmarks such as math and AIME on the benchmark. However, reinforced learning (RL) used to train these models faces a challenge: validating the correctness of natural language proofs is very difficult and requires careful manual examination of each reasoning step. This limits the application of RL in training mathematical theorem models. While formal languages like Lean provide automatic correctness verification, current LLM formal abandoners face their limitations. Level Predators generate code step by step, but require special scaffolding and lack advanced inference capabilities.

The Bontedance seed team introduced the seed prover, a lemma-based all-round reasoning model. It uses lean feedback, previously established lemma and self- and self-summerization to iterate iteratively. Seed abandoners adopted three specialized test time reasoning strategies, allowing in-depth and extensive inference methods to solve IMO-level competition problems. Its main innovation is the use of lemma style proof as its core approach, placing lemma at the center of the reasoning process rather than relying on traditional step-by-step or comprehensive generative approaches. Furthermore, this paper introduces seed geometry, a complementary geometric reasoning engine that overcomes Lean’s limitations in handling geometric support.

For interaction between seed por and lean, multi-stage multitasking RL based on VAPO is used. Training datasets combine open source datasets with internal form issues, using suggestionists to create simpler variants of difficult tasks. It does not include the problem of over 25% proof rate being too simple. The backend of seed geometry supports large-scale problems, with more than 230 million unique problems identified in 7 days, and the search efficiency is eight times higher. A separate strategy and value model was trained, although extensive testing showed that value models could degrade performance due to estimation errors. As a result, step-by-step generation with beam search is adopted in the distributed setup.

Seed prover achieves the latest results in multiple mathematical benchmarks. For IMO 2025, seed parabolic completely solves 5 of the 6 problems, seed geometry can solve problem 2 immediately, and use various reasoning settings to solve the proof of the remaining problems. On past IMO issues, it proved that 121 out of 155 tasks achieved a success rate of 78.1% on all difficulty levels. Performance decomposition showed good results across problem categories: 47 out of 55 simple questions solved, 47 out of 56 medium questions, 27 out of 44 hard questions, subject-specific success rates, including 72 out of 85 algebras, 42 out of numerical theory, and 72 out of combinatorial science.

On Minif2F, the researchers achieved a proof rate of 99.6% on the validation and test set in the medium setup, solving difficult problems such as IMO 1990 P3. The results of putnambench show that when upgrading from light to medium inference setup, the problem was solved from 201,331 on the 201,331 problem, showing a significant performance jump over previous undergraduate-level mathematical inference systems. Among partners, seed parabolic solves 30 of 100 combinatorial problems, performing better than existing methods, but reveals ongoing challenges in combinatorial reasoning. The researchers achieved 81.8% success on MinictX-V2, showed strong generalization on the competition issue, and outperformed 44.3% of the O4-Mini baseline at PASS@8.

In short, serial seeds propose seed geometry and seed-sow, two formal reasoning methods that can integrate LLM’s capabilities. Seed geometry provides accelerated validation and enhanced search mechanisms, while seed training utilizes iteratively sophisticated and complex test-time reasoning strategies. The implementation of 5 of the 6 problems in IMO 2025 shows the practical efficacy of these methods in solving elite math competitions. Adopting formal languages like Lean provides quick proof verification, is more cost-effective than human experts, and is more reliable than LLM-based judges. Future research will focus on combining formal systems with LLM to solve open conjectures.


Check Paper and Github page. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.


Sajjad Ansari is a final year undergraduate student from IIT Kharagpur. As a technology enthusiast, he delves into the practical application of AI, focusing on understanding AI technology and its real-world impact. He aims to express complex AI concepts in a clear and easy way.