MIT and NUS researchers introduce MEM1: Memory efficiency framework for long-distance language agents

Modern language agents need to handle multi-transfer conversations, retrieving and updating information as tasks develop. However, most current systems simply add all past interactions to the prompts, regardless of the correlation. This leads to swelling memory usage, slower performance, and poorer reasoning in longer inputs not seen during training. Real-world examples, such as research or shopping assistants, show how subsequent issues depend on previous circumstances. However, sustained growth can drive system resources and attention pressure. Although some solutions use external memory modules, they are difficult to integrate. This raises an important question: Can language models learn to manage their memory intelligently as part of their reasoning?
Limitations of context growth tips and memory integration challenges
LLM proxy has gone from handling simple queries to browsing complex multi-step tasks (such as web browsing and research). Frameworks like React (combining reasoning and action) help achieve these capabilities. Training methods often rely on behavioral cloning or reinforcement learning to shape proxy behavior. However, managing memory during multi-turn interactions remains a challenge. A common approach adds all past context to each prompt, resulting in swelling and inefficient memory usage. While external tools such as hounds or summaries help them, they are often separated from agents’ reasoning, complicating integration.
Introduction to MEM1: Reinforcement learning framework for constant memory language agents
Researchers at MIT, NUS, SMART, and Yonsei University have developed MEM1, an enhanced learning framework that enables language agents to handle complex multi-turn tasks while maintaining constant memory usage. Instead of storing the full interaction history, MEM1 updates a compact internal state at each step, merging new information with memory and discarding unnecessary details. This unified reasoning and memory approach can improve efficiency and performance without the need for additional modules. MEM1 was tested in a variety of tasks, including Web QA and online shopping, showing up to 3.5x performance and 3.7x memory usage compared to the larger models, while also promoting it to longer, invisible task sequences.
Combining memory pruning and iterative reasoning to solve human-like problems
MEM1 aims to solve complex inference tasks by combining memory management with iterative thinking. At each step, the agent processes new information and integrates it with prior knowledge to form a merged internal state, then trims the previous context to maintain memory efficiency. This structured memory update reflects how humans solve the puzzle through key information while discarding the rest. The team used reinforcement learning to train the agent to retain only relevant data and apply a blocking strategy during the optimization process to ensure accurate policy updates. To better test long-term reasoning, they can also create multi-objective quality quality quality committee tasks from existing data sets.
Benchmarking on long-distance running quality inspection and navigation tasks MEM1
The study evaluates the ability of the MEM1 agent to handle complex multi-turn tasks while maintaining almost continuous memory usage. MEM1 uses reinforcement learning training on the QWEN2.5-7B basic model and tests answers to the enhanced generation and web navigation environment through retrieval. Using precision and efficiency metrics, compare them to several baselines. The results show that MEM1 performs better than others in long-distance tasks, and can maintain strong performance even if the task complexity increases. It uses fewer tokens, responds faster, and scales more efficiently. Although smaller, MEM1 even surpasses larger models such as QWEN2.5-14B teaching and GPT-4O.
Conclusion and future direction of reinforcement learning and memory merging in LLMS
In summary, MEM1 is a reinforcement learning framework designed to help language agents handle long-term multi-step tasks more efficiently. Unlike the traditional method of storing all past information, resulting in memory bloat and slower performance, MEM1 maintains a compact internal state by merging new inputs with memory and discarding unnecessary data. It performs well in tasks such as question answering and web navigation with less memory and computing power. However, MEM1 assumes many clear, reliable reward signals that are lacking in the real world. Future work aims to adapt MEM1 to open-ended tasks with uncertain or delayed rewards, thus extending its applications to a wider, more practical situation.
Check Paper. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.
