Working with very long documents remains an ongoing challenge for large language models (LLMs). Even with techniques such as extrapolation length and sparse attention, models are often affected by performance degradation and high computational costs. To solve this problem, researchers from Bontedance Seed and Tsinghua University introduced commemoratea memory proxy based on reinforcement learning, aims to make long-form cultural processing linear complexity and minimal performance losses.
Limitations of existing methods
Current solutions for long-form cultural modeling are divided into three main categories:
- Length extrapolation method (e.g., NTK, PI, yarn, DCA): Extend context windows through position embedding operations. However, they often face performance degradation and scaling problems.
- Sparse and linear attention mechanisms: Reduce the complexity of attention to O(n), but usually requires retraining from scratch and rely on fixed patterns or artificially defined rules.
- Context compression: Use token level or external memory modules to condense long inputs, but often breaks standard generation and fights extrapolation.
These methods cannot provide all three key properties: arbitrary input length support, consistent accuracy, and effective linear complexity.
Memorial: Human-like memory strategies
Inspired by humans summarizing key information while ignoring noise, the amazing process input is used as evidence. In each step, it reads the document block and an internal memory and overwrites the latter with an updated compression context.
Key Innovations:
- Based on fixed token memory: Compress basic information while maintaining model compatibility.
- Coverage mechanism for segmentation: Supports unlimited text length without increasing memory.
- Linear complexity: The memory update and decoding costs remain constant.

GRPO’s multi-core RL training
Monumental treats each document as an independent dialogue. It passes Group Relative Policy Optimization (GRPO) In a multi-converted RL pipeline DAPOenable reward-driven memory updates.

Key elements include:
- Rule-based validator: Calculate the result reward by comparing model answers to multiple basic truths.
- Token-level RL signal: Apply evenly in the sample.
This setting encourages memory compression, focuses on answer-related answers, and abandons distracting factors.
Performance evaluation
Using HotPotQA and Squad’s ruler benchmarks and synthetic datasets, Memagent trained through an 8K context window and introduced up to 3.5 million tokens.
Model | 224k | 896k | 3.5m |
---|---|---|---|
QWEN2.5-INSTRUCT-14B-1M | 37.5% | 0.0% | N/A. |
Qwenlong-L1-32B | 17.2% | 11.7% | N/A. |
RL-Memagent-14b | 81.3% | 77.3% | 78.1% |
The monument maintains over 95% accuracy on the scale benchmark (8k to 512k tokens) and consistently exceeds the performance of long-term following and distillation-based baselines.


Case study: Multi-hop quality inspection
Given the question of “the director of the romantic comedy ‘Big Stone Gap’ is from New York City?”, the Memorial gradually tracks the relevant content of 3 blocks:
- Identify unrelated content, but retain location information.
- Keep insignificant memories.
- When you encounter Adriana Trigiani’s biography, your memory is correctly updated.
Final answer: Greenwich Village, New York City.
Theoretical basis and complexity
The memorial re-regression model is re-used using potential memory variables (M₁…Mₖ):
p(x₁:n)= ∑ₘ₁:ₖp(cₖ|mₖ₋₁) * p(mₖ|cₖ,m₋₁)
This enables O(N) computational cost and human-readable intermediate memory – attention-based feature compression. RL is essential because memory updates are discrete and cannot be learned by backpropagation.
in conclusion
Memagent provides scalable and effective solutions for three elements of long culture: unlimited input length, nearly infinite accuracy and linear complexity. Its RL-based overwrite memory mechanism allows LLM to read, abstract and generate inputs from millions of tokens without architectural modifications.
FAQ
Q1: What is delicious?
Memagent is a framework based on enhancement learning that equips LLMS with memory tokens to effectively handle very long contexts.
Q2: How is it different from attention or inference methods?
Unlike attention-based scaling or extrapolation techniques, Memorial updates token-based memory through enhanced learning.
Q3: Which models can be applied to which models?
Any transformer-based LLM. No changes to the model architecture are required.
Q4: How to expand using input size?
It maintains linear computational complexity by fixed memory size regardless of input length.
Q5: What is the application of souvenirs?
Long document quality inspection, agency memory systems, legal document review, scientific literature analysis, and real-time decision-making with a large evidence base.
Check Paper. All credits for this study are to the researchers on the project.
Sponsorship Opportunities: Attract the most influential AI developers in the United States and Europe. 1M+ monthly readers, 500K+ community builders, unlimited possibilities. [Explore Sponsorship]

Sajjad Ansari is a final year undergraduate student from IIT Kharagpur. As a technology enthusiast, he delves into the practical application of AI, focusing on understanding AI technology and its real-world impact. He aims to express complex AI concepts in a clear and easy way.