AI releases DeepSwe: a fully open source RL-trained encoder based on QWEN3-32B and achieves 59% on SWEBENCH

by admin · July 3, 2025

Together, AI released DeepSwe, a state-of-the-art, fully open software engineering agent that is fully trained through enhanced learning (RL). DeepSwe builds on the QWEN3-32B language model, achieving 59% accuracy and 42.2% PASS@1 in the SWEBENCH-verified benchmark, ranking first in the open weight model. This release represents a significant shift in AI, from traditional preprocessing pipelines to creating autonomous language agents that are constantly learning and improving through real-world feedback.

Reinforcement learning conforms to code generation

DeepSwe is the result of the QWEN3-32B basic model tailored by RLLM (Agentica’s modular reinforced learning framework). Unlike conventional supervision’s fine-tuning approach, RLLM allows agents to adapt to real-world workflows through experience. DeepSwe is specially trained to solve complex software engineering tasks using feedback-driven loops instead of static datasets.

The training pipeline contains Agentica’s R2EGYM dataset, a software engineering benchmark designed for RL-style agent development. The framework focuses on training language models with action-oriented objectives, such as fixing errors, completing functions, and editing code, rather than just predicting the distribution of the next sentence. This is more deeply intimately connected with the way human engineers iterate and learn from the results.

Performance benchmarks and features

Under the strictest benchmark for Swebench validation of software engineering agents, DeepSwe scored 59% through test time scaling. This is much better than previous open models. In the Pass@1 assessment, this assessment measures the possibility that the agent will correctly solve the problem in the first attempt – reaching an impressive 42.2% in the depths.

These results emphasize the power of RL-based training in enhancing proxy behavior, especially in areas where iterative reasoning and precise output (e.g. code synthesis). The architecture of this model inherits from QWEN3-32B, allowing it to scale efficiently while still being suitable for practical applications.

The core of open source and repeatability

One of the outstanding features of this edition is its complete transparency. AI and Agentica jointly open source the DeepSwe model and also open source the entire training recipe, including the RLLM framework, R2EGYM dataset and training configuration scripts. This promotes repeatability and invites a wider community of research and developer to expand or build without limitations.

Developers can access DeepSwe and RLLM via the following:

From language reasoning to language proxy

DeepSwe marks a transformation in philosophy and practice: from building models about languages to building an agency that learns through interactively. Traditional LLMs show strong reasoning skills, but often lack the ability to adapt to feedback or use improvements. Reinforcement learning allows these models not only to perform well at startup, but also to get better over time, adapting to new problem distributions and domains.

This approach also opens the door to local deployment. Since DeepSwe is completely open source and modular, it can be scaled and retrained to target organization-specific use cases. Developers and researchers can use RLLM to build their own agents on top of DeepSwe to serve different areas such as web navigation, robotics or automated research help.

in conclusion

DeepSwe is a milestone in the development of software engineering generation AI. By applying reinforcement learning to large language models such as QWEN3-32B and freeing up the entire training infrastructure, AI jointly enables future agents not only to identify and deploy agents, but also to be trained and improved. This leap from language understanding to action-oriented agents is of great significance in programming, automation and intelligent system design.

All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

AI releases DeepSwe: a fully open source RL-trained encoder based on QWEN3-32B and achieves 59% on SWEBENCH

Reinforcement learning conforms to code generation

Performance benchmarks and features

The core of open source and repeatability

From language reasoning to language proxy

in conclusion

You may also like...

live chat

Recent Posts

AI releases DeepSwe: a fully open source RL-trained encoder based on QWEN3-32B and achieves 59% on SWEBENCH

Reinforcement learning conforms to code generation

Performance benchmarks and features

The core of open source and repeatability

From language reasoning to language proxy

in conclusion

You may also like...

DAI#59 – APIs, Dead Bills, and NVIDIA Open

Visualizing neuronal activity with high-speed voltage imaging

Elon Musk’s Grok-3: A new era of AI-driven social media

live chat

Recent Posts