Meta Super Smart Lab introduces scaling: a scaling rag with 16x longer context and 31x faster decoding

by admin · September 7, 2025

A team of researchers from Meta Super Smart Labs, National University of Singapore and Rice University unveiled Re-list (representation of rag)a decoding framework that can reconsider retrieval enhancement power generation (RAG) efficiency. Refrag by 16× Reached 30.85×Acceleration Acceleration (TTFT) No damage to accuracy.

Why are novels the bottleneck of LLM?

The attention mechanism in the large language model scales quadratically with the input length. If the document is twice the length, the computation and memory costs may increase fourfold. This not only slows down reasoning, but also increases the size of the key-value (KV) cache, making large cultural applications in production systems impractical. In the RAG setup, most of the retrieved paragraphs contribute little to the final answer, but the model still pays the entire secondary price to process them.

How does refraction compress and shorten the context?

Refrag introduces a lightweight encoder that separates blocks of fixed size (such as 16 tokens) and compresses each block into a dense center Block Embed. Instead of feeding thousands of original tokens, the decoder handles shorter embedding sequences. turn out Sequence length is reduced by 16 timesno LLM architecture has been changed.

How to achieve acceleration?

By shortening the input sequence of the decoder, the secondary attention calculation is reshortened and the KV cache is reduced. Empirical results show 16.53×TTFT acceleration of k = 16 and 30.85× acceleration at k = 32far exceeding the latest CEPE (only 2-8 times). Throughput can also be improved 6.78× Compared with camel baseline.

How does Refrag maintain accuracy?

Reinforcement learning (RL) policy supervision compression. It identifies the most communicative blocks and allows them to bypass compression, feeding the original token directly into the decoder. This selective strategy ensures that critical details, such as exact numbers or rare entities, are not lost. In multiple benchmarks, refraction maintains or improves confusion compared to CEPE at lower delays compared to CEPE.

What did the experiment reveal?

Refrag was estimated on 20B tokens from the Slimpajama corpus (Books + Arxiv) and tested on a long article dataset including books, Arxiv, PG19 and Profifpile. Rewriting always outperforms a strong baseline on rag benchmarks, multi-transfer conversation tasks and long-term document summary:

16×Context Extension Beyond the standard Camel-2 (4K token).
~9.3% confusion improvement Passed by CEPE on four datasets.
The ability to process more paragraphs under the same delay budget is higher in the weakhound setting, which is unrelated.

Summary

The fold line shows that the novel LLM does not have to slowly or desire memory. By compressing the retrieved paragraphs into compact embeddings, selectively extending only important embeddings, and rethinking how rag decoding works, Meta Ultra Smart Labs makes it possible to process larger inputs when running faster. This makes large cultural applications (such as analyzing entire reports, handling multi-transfer conversations or extending enterprise rag systems) not only feasible but effective without compromising accuracy.

FAQ

Q1. What is refraction?
refrag (the representation of the rag) is a decoding framework from the Metas Super Intelligent Lab that compresses the search for embedded paragraphs, making faster, longer text inferences in LLMS.

Q2. Compared with existing methods, how to focus on quality is faster?
Relist 30.85×Faster Time First (TTFT) and 6.78× Throughput Improvement Compared with the baseline of llamas, the ratio was better than CEPE.

Q3. Will compression reduce accuracy?
no. Reinforcement learning policies ensure that critical blocks remain uncompressed, thus retaining critical details. Cross-reference, refraction maintains accuracy or improves accuracy relative to previous methods.

Q4. Where is the code available?
Meta Super Smart Labs to release refrag on GitHub on Facebook Research/Refrag

Check The paper is here. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

Meta Super Smart Lab introduces scaling: a scaling rag with 16x longer context and 31x faster decoding

Why are novels the bottleneck of LLM?

How does refraction compress and shorten the context?

How to achieve acceleration?

How does Refrag maintain accuracy?

What did the experiment reveal?

Summary

FAQ

You may also like...

live chat

Recent Posts

Meta Super Smart Lab introduces scaling: a scaling rag with 16x longer context and 31x faster decoding

Why are novels the bottleneck of LLM?

How does refraction compress and shorten the context?

How to achieve acceleration?

How does Refrag maintain accuracy?

What did the experiment reveal?

Summary

FAQ

You may also like...

An anti-FGFR1 humanized antibody drug candidate offers new approach to fight aggressive lung cancer

META’s breakdown effect: What advertisers need to know

Related Media Creative Decomposition – Jon Loomer Digital

live chat

Recent Posts