Microsoft announced during its annual Ignite conference the public preview of memory in the Foundry Agent Service, a fully managed, long-term memory store natively integrated with its agent service.
With memory, developers can store, retrieve, and manage chat summaries, user preferences, and critical context across sessions, devices, and workflows. The authors of a Foundry blog post write:
Our memory systematically extracts both user profile information and chat summaries from conversation histories.
In the Foundry portal, developers can enable the memory feature, and the memory store will be automatically created and configured for their agent. In addition, developers can use the SDK or APIs to work with the feature.
In the documentation, the company explains that memories are stored as items in a managed memory store and operate in three phases:
- Extraction phase, where the system extracts key information from user interactions, such as preferences (e.g., “allergic to dairy”) and recent activities.
- The consolidation phase, in which extracted memories are merged to avoid redundancy. Conflicting information, such as a new allergy, is resolved to ensure accuracy.
- Retrieval phase, in which the agent uses hybrid search techniques to quickly find relevant memories, ensuring natural and informed conversations, with core user information retrieved at the start.
(Source: Microsoft documentation)
With memory, a key parameter is the scope, which controls how it is partitioned. Each scope in the memory store keeps an isolated collection of memory items. Hence, developers can partition the memory store using unique identifiers, such as a user’s Entra ID or a custom UUID (for storing and retrieval).
As a public preview, the service includes specific operational constraints. Currently, each scope can store up to 10,000 discrete memory items, and the system has a throughput limit of 1,000 requests per minute.
By moving memory management from the application logic to the service runtime, Foundry handles the complex “plumbing” of extraction and retrieval automatically. This represents a shift from traditional Retrieval-Augmented Generation (RAG), which often operates like a search engine, to a persistent-state layer.
Vivan Amim, Director, AI Research at Microsoft, noted in a LinkedIn post:
Memory is quickly becoming the ‘state layer’ for agentic systems. Foundry is turning that from a demo feature into an enterprise primitive.
This shift suggests that long-term context is moving from a custom implementation to a core infrastructure requirement. During the public preview, memory features are free; users are billed only for the underlying chat and embedding model usage.
