
Developers are actively working to bring AI agents to market, but a significant hurdle has been the lack of memory. Without the ability to recall past interactions, agents treat each conversation as if it’s the first, leading to repetitive questions, an inability to remember user preferences, and a general lack of personalization. This results in frustration for both users and developers.
Historically, developers have attempted to mitigate this by inserting entire session dialogues directly into an LLM’s context window. However, this approach is expensive and computationally inefficient, leading to higher inference costs and slower response times. Furthermore, feeding too much information, especially irrelevant details, can degrade the model’s output quality, causing issues like “lost in the middle” and “context rot”.
Introducing Vertex AI Memory Bank
To overcome these limitations, Google Cloud has announced the public preview of Memory Bank, a new managed service within the Vertex AI Agent Engine. Memory Bank is designed to help you build highly personalized conversational agents that facilitate more natural, contextual, and continuous engagements.
For instance, here is a personalized healthcare agent: Key information about a user’s allergy and previous symptoms mentioned in the past sessions is needed to provide a more informed response in the current session
Memory Bank addresses the fundamental memory problem in several key ways:
- Personalize interactions: It goes beyond generic scripts by remembering user preferences, key events, and past choices to tailor every response.
- Maintain continuity: Conversations can pick up seamlessly where they left off, even across multiple sessions that might span days or weeks.
- Provide better context: Agents are armed with the necessary background on a user, leading to more relevant, insightful, and helpful responses.
- Improve user experience: It eliminates the frustration of users repeating information, creating more natural, efficient, and engaging conversations.
How Memory Bank Works
Memory Bank operates through an intelligent, multi-stage process, leveraging Google’s Gemini models and novel research:
- Understands and Extracts Memories: Memory Bank analyzes a user’s conversation history (stored in Agent Engine Sessions) to extract key facts, preferences, and context. This process happens asynchronously in the background, generating new memories without requiring developers to build complex extraction pipelines.
- Stores and Updates Memories Intelligently: Key information, such as “I prefer sunny days” is stored and organized by a defined scope, like a user ID. When new information emerges, Memory Bank, using Gemini, can consolidate it with existing memories, resolving contradictions and ensuring the memories remain up to date.
- Recalls Relevant Information: When a new conversation session begins, the agent can retrieve these stored memories. This retrieval can be a simple recall of all facts or a more advanced similarity search using embeddings to find memories most relevant to the current topic. This ensures the agent is always equipped with the right context.
This entire process is grounded in Google Research’s novel research method, accepted by ACL 2025, which provides an intelligent, topic-based approach to how agents learn and recall information, setting a new standard for agent memory performance. An example is how a personal beauty companion agent can remember a user’s evolving skin type to make personalized product recommendations.
Getting Started with Memory Bank
Memory Bank is integrated with the Agent Development Kit (ADK) and Agent Engine Sessions. Developers can define an agent using ADK and enable Agent Engine Sessions to manage conversation history within individual sessions. Memory Bank can then be enabled to provide long-term memory across multiple sessions.
You can integrate Memory Bank into your agent in two primary ways:
- Develop an agent with Google Agent Development Kit (ADK) for an out-of-the-box experience.
- Develop an agent that orchestrates API calls to Memory Bank if you are building your agent with any other framework, including popular ones like LangGraph and CrewAI.
For those new to Google Cloud but using ADK, an express mode registration for Agent Engine Sessions and Memory Bank allows you to sign up with a Gmail account to receive an API key and build within free tier usage quotas before seamlessly upgrading to a full Google Cloud project for production.