Memo: A memory-centric operating system for development and adaptive large language models

by admin · June 14, 2025

LLMs are increasingly seen as key to implementing artificial general intelligence (AGI), but they face major limitations in processing memory. Most LLMs rely on fixed knowledge stored in their weights and transient contexts during use, so it is difficult to retain or update information over time. Technologies such as rags attempt to incorporate external knowledge, but lack structured memory management. This leads to problems such as forgetting past conversations, poor adaptability, and cross-platform isolation of memory. Fundamentally, today’s LLMs do not view memory as a manageable, durable or shareable system, limiting their real-world uses.

To solve the limitations of memory in current LLM, Memtensor (Shanghai) Technology Co., Ltd. , Shanghai He Qiaotang University, Renmin University of China and China Institute Telecom have developed a memorandum. This memory operating system makes memory a top-notch resource in the language model. The core is Memcube, a unified memory abstraction that manages parameters, activations and plaintext memory. Memos can implement structured, traceable and cross-task memory processing, allowing models to continuously adapt, internalize user preferences and maintain behavioral consistency. This transformation transforms LLM from a passive generator to an evolving system capable of long-term learning and cross-platform coordination.

As AI systems develop more and more complex (mastering multiple tasks, roles, and data types), language models must not only develop understanding text, but also keep memory and learning continuously. Current LLMs lack structured memory management, which limits their ability to adapt and grow over time. Memos are a new system that treats memory as the core, planning resources. It enables long-term learning through structured storage, versioning and unified memory access. Unlike traditional training, the memorandum supports a continuous “memory training” paradigm that blurs the line between learning and reasoning. It also emphasizes governance, ensuring traceability, access control, and secure use in evolving AI systems.

Memo is a memory-centric operating system for language models that not only treat memory as stored data, but also an active, evolving component of model cognition. It divides memory into three different types: parameter memory (baked to model weights through preprocessing or fine-tuning), activated memory (temporary internal states such as KV speed and attention patterns used during KV cache and reasoning), and plain text memory (editable, recyclable, retrieable external data such as documents or prompts). These memory types interact in a unified framework called MemoryCube (MEMCUBE), which encapsulates content and metadata, allowing dynamic scheduling, versioning, access control, and transformation across types. This structured system enables LLM to adapt, recall relevant information and effectively develop its functions rather than just converting them into static generators.

At the heart of the memo is a three-layer architecture: the interface layer processes user input and parses it into memory-related tasks; the operational layer manages scheduling, organization and evolution of different types of memory; the infrastructure layer ensures secure storage, access governance, and cross-organization collaboration. All interactions within the system are mediated by Memcubes, allowing traceable, policy-driven memory operations. Through modules such as MEMSCHEDULER, MEMLIFECYCLE and MEMGOVERNANCE, MEMOS maintains a continuous and adaptive memory loop, from the moment the user sends the prompt to the memory injection during inference, to storing useful data for future use. The design not only enhances the responsiveness and personalization of the model, but also ensures that memory remains structured, safe and reusable.

In short, memos are a memory operating system designed to make memory a central, easy-to-manage component in LLM. Unlike traditional models that mainly depend on static model weights and short-term runtime state, MEMOS introduces a unified framework for processing parameters, activations, and plaintext memory. At its core is Memcube, a standard memory unit that supports structured storage, lifecycle management and task-aware memory enhancement. The system enables more coherent reasoning, adaptability and cross-agent collaboration. Future goals include promoting memory sharing across models, evolving memory blocks of self-evolving memory and building a decentralized memory market to support continuous learning and intelligent development.

Check Paper. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.