MEMP: A task-independent framework that promotes program memory to core optimization goals in LLM-based proxy
LLM agents have become powerful enough to handle complex tasks, from web research and reporting generation to data analysis and multi-step software workflows. However, they struggle with program memory, which is usually rigid, manually designed or locked inside the model weight. This makes them vulnerable: Unexpected events, such as network failures or UI changes, may force a restart. Unlike humans who learn by repeating past experiences, current LLM agents lack a systematic way to build, refine and reuse program skills. Existing frameworks provide abstraction, but have largely not addressed optimization of memory life cycles.
Memory plays a crucial role in language agency, enabling them to recall the interactions of the past in short-term, plot and long-term contexts. Although current systems use methods such as vector embedding, semantic search, and hierarchy to store and retrieve information, thereby effectively managing memory, especially program memory, is still a challenge. Program memory helps agents internalize and automate repetitive tasks, but strategies for building, updating, and reusing are not ignored. Similarly, agents learn from experience by strengthening learning, imitating or replaying, but face problems like inefficiency, generalization and forgetting.
Researchers from Qianjiang University and Alibaba Group introduced MEMP, a framework designed to provide agents with lifelong, adaptive program memory. MEMP converts past trajectories into detailed step instructions and higher-level scripts, while providing strategies for memory building, retrieval and update. Unlike static methods, it continuously improves knowledge through addition, verification, reflection and discarding to ensure relevance and efficiency. MEMP was tested on Alfworld and Travel Planner, always improving accuracy and reducing unnecessary exploration and optimization of token usage. It is worth noting that the memory from a stronger model is effectively transferred to the weaker model, thereby improving its performance. This shows that MEMP enables agents to learn, adapt and promote across tasks.
This is the Markov decision-making process when a proxy performs actions with its environment, uses tools, and refines behaviors across multiple steps. Each step generates states, actions, and feedback to form trajectories that also generate rewards based on success. However, solving new tasks in unfamiliar environments often results in wasted steps and tokens, as the agent repeats exploratory actions that have been performed in early tasks. Inspired by human program memory, the framework provides agents with memory modules that store, retrieve and update program knowledge. This enables agents to repurpose past experience, reduce redundant trials and increase the efficiency of complex tasks.
Experiments on TravelPlanner and Alfworld show that storing trajectories as detailed steps or abstract scripts can improve accuracy and reduce exploration time. Retrieval strategies based on semantic similarity further improve memory usage. Meanwhile, dynamic update mechanisms such as verification, tuning, and reflection allow agents to correct errors, discard outdated knowledge and continuously improve skills. The results show that program memory not only improves task completion rate and efficiency, but can also effectively transfer from stronger models to weaker models, thereby significantly improving the performance of the system. Additionally, scaling retrieval can improve the results until a little bit, after which too much memory can overwhelm the context and reduce effectiveness. This highlights that program memory is a powerful way to make agents more adaptable, efficient and human in learning.
In short, MEMP is a task-agnostic framework that treats program memory as a central element for optimizing LLM-based agents. By systematically designing strategies for memory construction, retrieval and updating, MEMP allows agents to refine, refine and reuse past experiences to improve the efficiency and accuracy of long-distance tasks such as TravelPlanner and Alfworld. Unlike memory of static or manual engineering, MEMP develops dynamically, constantly updating and discarding outdated knowledge. The results show that when transferring memory from a stronger model to a weaker model, it exhibits stable performance growth, effective learning, and even transferable benefits. Looking ahead, richer search methods and self-evaluation mechanisms can further enhance the agent’s adaptability in the real world.
Check Technical papers. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.