AI

Tülu3 of Allen AI has just become Deepseek’s unexpected competitor

Headline news continues. DeepSeek’s model has been challenging the benchmark, setting new standards and making a lot of noise. However, some interesting things just happened in the AI ​​research scene, which is also worth your attention.

Allen AI quietly released their new Tülu3 family model family. Their 405B parameter version not only competes with Deepseek-it matches or defeats it on the key benchmark.

Let us see it as a perspective.

The 405btülu3 model is doing various tasks with the best performance (such as Deepseek V3). We see in areas such as mathematical problems, coding challenges, and precise instructions see comparable or excellent performance. They also conducted in a completely open method.

They have released a complete training channel, code, and even their novel strengthening learning methods, called enhanced learning, and have verified rewards (RLVR), which makes it possible.

In the past few weeks, such development has indeed changed the top AI development method. When a completely open source model can match the best closed model, it will open the possibility of previously locked behind the wall of the private company.

Technical warfare

What makes Tülu3 stand out? It is attributed to a unique four -stage training process, which exceeds the traditional method.

Let’s see how Allen AI builds this model:

Phase 1: Strategic data selection

The team knows that the model quality begins with data quality. They combine the establishment of data sets and open assistants such as WildChat, and have customized content. But this is the key opinion: they not only summarize the data-they have created targeted data sets for specific skills such as mathematical reasoning and coding proficiency.

Phase 2: Establish a better response

In the second stage, Allen AI focuses on teaching its model specific skills. They created different training data sets-some training data for mathematics, other training data for coding, and more for general tasks. By repeatedly testing these combinations, they can accurately see the excellent model and the location that needs to work. This iterative process reveals the real potential that Tülu3 can achieve in each area.

Phase 3: Learn from comparison

This is where Allen AI creates creativity. They build a system that can immediately compare Tülu3’s response to other top models. But they also solved the continuity problem in AI-models only wrote a trend of long response only for length. The way they use a normalized preference optimization (DPO) method of length (DPO) means that the model learns to evaluate quality instead of quantity. result? Both accurate and purposeful response.

When AI models learn from preference (which response is better, A or B?), They tend to produce frustrating prejudices: they start to think that longer response is always better. Just like they tried to say more instead of good words to win.

The length of length is to solve this problem by adjusting the way of learning from the preference. It not only considers the priority of which response, but also the length of each response. It can be regarded as judging answers based on the quality of each word, not just all their influences.

Why is this important? Because it helps to learn to learn accurate and efficient. It does not use additional words to fill the response to more comprehensive, but learned to achieve value with any length of actual needs.

This seems to be a small detail, but it is essential for the establishment of AI for natural communication. The best human experts know when it is simple and when it will be explained in detail-this is exactly the DPO of the length of length to help the professor model.

Phase 4: RLVR Innovation

This is a technical breakthrough worthy of attention. RLVR uses specific verification to replace the subjective reward model.

Most AI models learn through a complex reward model system-basically the guess of good education to make good response. But Allen AI and RLVR have taken a different path.

Consider how we currently train the AI ​​model. We usually need other AI models (referred to as reward models) to determine whether the response is good. It is subjective, complex, and usually inconsistent. Some reactions seem to be good, but it contains the subtle mistakes that are gradually passing.

RLVR flip this method on its head. It does not rely on subjective judgment, but uses specific, verified results. When the model tries mathematical problems, there is no gray area-the answer is right or wrong. When it writes code, the code is either running correctly or unable to run.

This is an interesting place:

  • This model immediately gets binary feedback: 10 points for the correct answer, 0 for incorrect answers
  • There is no space for credit or vague assessment
  • Learning becomes focused and accurate
  • This model learns to prioritize accuracy, rather than reasonable response

RLVR Training (ALLEN AI)

result? Tülu3 has shown significant improvements in the most important task of correctness. Its performance in mathematical reasoning (GSM8K benchmark) and coding challenges has a significant rise. Even the guidance that follows has become more accurate, because the model learns to pay attention to specific accuracy in response.

What makes this particularly exciting is how it changes the open source AI game. The previous method is usually difficult to match the accuracy of the closed model on the technical task. RLVR shows that using the correct training method, the open source model can achieve the same reliability.

Look at numbers

The 405b parameter version of Tülu3 competes directly with the top model on the scene. Let’s check its good position, what does this mean for open source AI.

math

Tülu3 is good at complex mathematical reasoning. In the reference tests such as GSM8K and Math, it matches the performance of Deepseek. This model handles multiple steps and shows strong mathematical reasoning capabilities.

Code

Coding results prove that it is also impressive. Thanks to RLVR training, Tülu3 wrote a code that effectively solved the problem. Its advantage is to understand the coding description and production solution.

Expert instructions below

The ability to follow the instructions is the core force. Although many models are approximate or generalized, Tülu3 shows extraordinary accuracy when executing the requirements.

Open the black box developed by AI

Allen AI has released a powerful model and its complete development process.

Every aspect of the training process has records and access. From the four-stage method to the data preparation method and the implementation of RLVR-the whole process lies in learning and copying. This transparency sets new standards for high -performance AI development.

Developers have obtained comprehensive resources:

  • Complete the training pipeline
  • Data processing tool
  • Assessment framework
  • Implement specifications

This enables the team to:

  • Modify the training process
  • Methods to adapt to specific needs
  • Based on verified methods
  • Create professional implementation

This open method has accelerated innovation in the entire field. Researchers can focus on improvement based on verification methods, not starting from scratch.

The rise of open source excellence

The success of Tülu3 is an important moment for open AI development. When the open source model matches or exceeds the private alternative, it fundamentally changes the industry. Global research teams can access verified methods to speed up their work and generate new innovation. The private AI laboratory will need to adapt-by improving transparency or further promoting the technical boundaries.

Looking forward to the future, Tülu3’s breakthrough in verified rewards and multi -stage training implies that it is about to happen. The team can improve performance based on these foundations. The code exists, records the method, and has begun a new wave of development of AI. For developers and researchers, the opportunity to try and improve these methods marks the beginning of the exciting chapter in AI development.

Common problems about TüLu3 (FAQ)

What is tülu3 and what is the main function?

Tülu3 is an open source LLM family developed by Allen AI and is based on the LLAMA 3.1 building. It has various sizes (8B, 70B and 405B parameters). Tülu3 aims to improve the performance of various tasks, including knowledge, reasoning, mathematics, coding, guidance and security.

What is the training process of Tülu3 and what data do you use?

Tülu3’s training involves several key stages. First of all, the team planned a series of tips from the public dataset and synthetic data for specific skills to ensure that the data pollutes the benchmark. Secondly, in the instructions follow, the mix of mathematics and coding data, performing the supervision of FineTuning (SFT). Next, it will be directly optimized (DPO) with the preference data generated by human and LLM feedback. Finally, the enhanced learning of verified rewards (RLVR) is used for tasks that can be measured. TüLU3 uses a planning data set at each stage, including character -driven descriptions, mathematics and code data.

How do Tülu3 handle safety and which indicators are used to evaluate it?

Security is a core component of TüLU3 development and involves the entire training process. During the SFT period, a data set was used in a secure data. This data set was found to be orthodox with other tasks.

What is RLVR?

RLVR is a training model to optimize the technique of verifying rewards, such as the correctness of the answer. This is different from the traditional RLHF using the reward model.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button