docs/decisions/0072-agents-with-memory.md
By memory we mean the capability to remember information and skills that are learned during a conversation and re-use those later in the same conversation or later in a subsequent conversation.
Today we support multiple agent types with different characteristics:
We need to support advanced memory capabilities across this range of agent types.
Another aspect of memory that is important to consider is the scope of different memory types. Most agent implementations have instructions and skills but the agent is not tied to a single conversation. On each invocation of the agent, the agent is told which conversation to participate in, during that invocation.
Memories about a user or about a conversation with a user is therefore extracted from one of these conversation and recalled during the same or another conversation with the same user. These memories will typically contain information that the user would not like to share with other users of the system.
Other types of memories also exist which are not tied to a specific user or conversation. E.g. an Agent may learn how to do something and be able to do that in many conversations with different users. With these type of memories there is of cousrse risk in leaking personal information between different users which is important to guard against.
All of the above memory types can be supported for any agent by attaching software components to conversation threads. This is achieved via a simple mechanism of:
With our current AgentThread implementation, when an agent is invoked, all input and output messages are already passed to the AgentThread
and can be made available to any components attached to the AgentThread.
Where agents are remote/external and manage conversation state in the service, passing the messages to the AgentThread may not have any
affect on the thread in the service. This is OK, since the service will have already updated the thread during the remote invocation.
It does however, still allow us to subscribe to messages in any attached components.
For the second requirement of getting additional context per invocation, the agent may ask the thread passed to it, to in turn ask each of the components attached to it, to provide context to pass to the Agent. This enables the component to provide memories that it contains to the Agent as needed.
Different memory capabilities can be built using separate components. Each component would have the following characteristics:
Building a service to host an agent comes with challenges. It's hard to build a stateful service, but service consumers expect an experience that looks stateful from the outside. E.g. on each invocation, the user expects that the service can continue a conversation they are having.
This means that where the the service is exposing a local agent with local conversation state management (e.g. via ChatHistory)
that conversation state needs to be loaded and persisted for each invocation of the service.
It also means that any memory components that may have some in-memory state will need to be loaded and persisted too.
For cases like this, the OnSuspend and OnResume methods allow notification of the components that they need to save or reload their state.
It is up to each of these components to decide how and where to save state to or load state from.
The types of events that Memory Components require are not unique to memory, and can be used to package up other capabilities too. The suggestion is therefore to create a more generally named type that can be used for other scenarios as well and can even be used for non-agent scenarios too.
This type should live in the Microsoft.SemanticKernel.Abstractions nuget, since these components can be used by systems other than just agents.
namespace Microsoft.SemanticKernel;
public abstract class AIContextBehavior
{
public virtual IReadOnlyCollection<AIFunction> AIFunctions => Array.Empty<AIFunction>();
public virtual Task OnThreadCreatedAsync(string? threadId, CancellationToken cancellationToken = default);
public virtual Task OnThreadDeleteAsync(string? threadId, CancellationToken cancellationToken = default);
// OnThreadCheckpointAsync not included in initial release, maybe in future.
public virtual Task OnThreadCheckpointAsync(string? threadId, CancellationToken cancellationToken = default);
public virtual Task OnNewMessageAsync(string? threadId, ChatMessage newMessage, CancellationToken cancellationToken = default);
public abstract Task<string> OnModelInvokeAsync(ICollection<ChatMessage> newMessages, CancellationToken cancellationToken = default);
public virtual Task OnSuspendAsync(string? threadId, CancellationToken cancellationToken = default);
public virtual Task OnResumeAsync(string? threadId, CancellationToken cancellationToken = default);
}
To manage multiple components I propose that we have a AIContextBehavior.
This class allows registering components and delegating new message notifications, ai invocation calls, etc. to the contained components.
I propose to add a AIContextBehaviorManager to the AgentThread class, allowing us to attach components to any AgentThread.
When an Agent is invoked, we will call OnModelInvokeAsync on each component via the AIContextBehaviorManager to get
a combined set of context to pass to the agent for this invocation. This will be internal to the Agent class and transparent to the user.
var additionalInstructions = await currentAgentThread.OnModelInvokeAsync(messages, cancellationToken).ConfigureAwait(false);
// Create a vector store for storing memories.
var vectorStore = new InMemoryVectorStore();
// Create a memory store that is tired to a "Memories" collection in the vector store and stores memories under the "user/12345" namespace.
using var textMemoryStore = new VectorDataTextMemoryStore<string>(vectorStore, textEmbeddingService, "Memories", "user/12345", 1536);
// Create a memory component to will pull user facts from the conversation, store them in the vector store
// and pass them to the agent as additional instructions.
var userFacts = new UserFactsMemoryComponent(this.Fixture.Agent.Kernel, textMemoryStore);
// Create a thread and attach a Memory Component.
var agentThread1 = new ChatHistoryAgentThread();
agentThread1.ThreadExtensionsManager.Add(userFacts);
var asyncResults1 = agent.InvokeAsync("Hello, my name is Caoimhe.", agentThread1);
// Create a second thread and attach a Memory Component.
var agentThread2 = new ChatHistoryAgentThread();
agentThread2.ThreadExtensionsManager.Add(userFacts);
var asyncResults2 = agent.InvokeAsync("What is my name?.", agentThread2);
// Expected response contains Caoimhe.
// Create Vector Store and Rag Store/Component
var vectorStore = new InMemoryVectorStore();
using var ragStore = new TextRagStore<string>(vectorStore, textEmbeddingService, "Memories", 1536, "group/g2");
var ragComponent = new TextRagComponent(ragStore, new TextRagComponentOptions());
// Upsert docs into vector store.
await ragStore.UpsertDocumentsAsync(
[
new TextRagDocument("The financial results of Contoso Corp for 2023 is as follows:\nIncome EUR 174 000 000\nExpenses EUR 152 000 000")
{
SourceName = "Contoso 2023 Financial Report",
SourceReference = "https://www.consoso.com/reports/2023.pdf",
Namespaces = ["group/g2"]
}
]);
// Create a new agent thread and register the Rag component
var agentThread = new ChatHistoryAgentThread();
agentThread.ThreadExtensionsManager.RegisterThreadExtension(ragComponent);
// Inovke the agent.
var asyncResults1 = agent.InvokeAsync("What was the income of Contoso for 2023", agentThread);
// Expected response contains the 174M income from the document.
ConversationStateExtension
1.1. Long
MemoryComponent
2.1. Too specific
AIContextBehavior
Decided 3. AIContextBehavior.
Decided: 1. Microsoft.SemanticKernel.<baseclass>.
Decided: 2. Microsoft.SemanticKernel.Core nuget