Developers are actively working to bring AI agents to the market, but a major obstacle is Lack of memory. Incapable of recalling past interactions, agents view each conversation as the first, leading to repetitive problems, inability to remember user preferences and a general lack of personalization. This brings frustration to users and developers.
Historically, developers have tried to mitigate this by inserting the entire session conversation directly into the context window of the LLM. However, this method is Expensive and inefficient in computingresulting in higher inference costs and slower response time. In addition, providing too much information, especially irrelevant details, may reduce the output quality of the model, resulting in problems such as “lost in the middle” and “rotten context”.
Introduction to Vertex AI memory library
To overcome these limitations, Google Cloud announces public preview Memory librarya new hosting service Vertex AI Agent Engine. Memory library is designed to help you build highly personalized conversation agents to promote more natural, contextual and continuous engagement.
For example, here is a personalized medical agent: key information about user allergies and previous symptoms mentioned in past meetings to provide a smarter response in the current meeting
Memory library solves basic memory problems in several key ways:
- Personalized interaction: Customize each response by remembering user preferences, critical events and past choices, it goes beyond universal scripts.
- Maintain continuity: Conversations can seamlessly touch where they leave, even in multiple meetings that may span days or weeks.
- Provide a better background: Agents arm the user with the necessary background, resulting in more relevant, insightful and useful responses.
- Improve user experience: It removes the frustration of users’ duplicate messages and creates a more natural, efficient and engaging conversation.
How memory library works
Memory libraries run through intelligent multi-stage processes, leveraging Google’s Gemini model and novel research:
- Understand and extract memory: Memory library analyzes user’s conversation history (stored in proxy engine session) Extract key facts, preferences and contexts. This process occurs asynchronously in the background, resulting in new memories without the need for developers to build complex extraction pipelines.
- Smart storage and memory update: Key information, such as “I prefer sunny days” is storage and organized by defined scopes like user IDs. When new information appears, the memory bank uses Gemini, which can merge it with existing memories, resolve conflicts and ensure that the memory remains up to date.
- Recall related information: When a new conversation session begins, the agent can retrieve these stored memories. This search can simply review all facts or more advanced Similarity Search using embedded Find the memories that are most relevant to the current topic. This ensures that the agent is always equipped with the correct context.
This entire process is based on Google Research’s novel research method accepted by ACL 2025, which provides an intelligent, topic-based approach to how agents learn and recall information, setting new standards for agent memory performance. An example is how personal beauty partner agents can remember the user’s ever-evolving skin type to make personalized product recommendations.
Beginner Memory Library
Memory library and Agent Development Kit (ADK) and Agent Engine Meeting. Developers can use ADK to define proxy and enable proxy engine sessions to manage conversation history across individual sessions. The memory library can then be made to provide long-term memory in multiple sessions.
There are two main ways you can integrate the memory library into the proxy:
- and Google Agent Development Kit (ADK) For an out-of-the-box experience.
- If you use any Other frameworksincluding popular people like Langgraph and Crewai.
For newbies in Google Cloud, but using ADK, Quick Mode Registration For proxy engine sessions and memory libraries, you can register with a Gmail account to receive API keys and build them in free tier usage, then seamlessly upgrade it to the full Google Cloud Project for production.

Max is an AI analyst at Marktechpost, based in Silicon Valley, who actively shapes the future of technology. He teaches robotics at Brainvyne, uses comma to combat spam, and uses AI every day to transform complex technological advancements into clear, understandable insights