docs/decisions/0032-agents.md
Support for the OpenAI Assistant API was published in an experimental *.Assistants package that was later renamed to *.Agents with the aspiration of pivoting to a more general agent framework.
The initial Assistants work was never intended to evolve into a general Agent Framework.
This ADR defines that general Agent Framework.
An agent is expected to be able to support two interaction patterns:
Direct Invocation ("No Chat"):
The caller is able to directly invoke any single agent without any intervening machinery or infrastructure. For different agents to take turns in a conversation using direct invocation, the caller is expected to invoke each agent per turn. Coordinating interaction between different agent types must also be explicitly managed by the caller.
Agent Chat:
The caller is able to assemble multiple agents to participate in an extended conversation for the purpose of accomplishing a specific goal (generally in response to initial or iterative input). Once engaged, agents may participate in the chat over multiple interactions by taking turns.
Fundamentally an agent possesses the following characteristics:
Various agents specializations might include:
An Agent can be of various modalities. Modalities are asymmetrical with regard to abilities and constraints.
ChatCompletionService).ChatMessageContent.Agents participate in a conversation, often in response to user or environmental input.
<p align="center"> <kbd></kbd> </p>In addition to Agent, two fundamental concepts are identified from this pattern:
Agents of different modalities must be free to satisfy the requirements presented by their modality. Formalizing the
Channelconcept provides a natural vehicle for this to occur. For an agent based on chat-completion, this means owning and managing a specific set of chat messages (chat-history) and communicating with a chat-completion API / endpoint. For an agent based on the Open AI Assistant API, this means defining a specific thread and communicating with the Assistant API as a remote service.
These concepts come together to suggest the following generalization:
<p align="center"> <kbd></kbd> </p>After iterating with the team over these concepts, this generalization translates into the following high-level definitions:
<p align="center"> <kbd></kbd> </p>| Class Name | Parent Class | Role | Modality | Note |
|---|---|---|---|---|
| Agent | - | Agent | Abstraction | Root agent abstraction |
| KernelAgent | Agent | Agent | Abstraction | Includes Kernel services and plug-ins |
| AgentChannel | - | Channel | Abstraction | Conduit for an agent's participation in a chat. |
| AgentChat | - | Chat | Abstraction | Provides core capabilities for agent interactions. |
| AgentGroupChat | AgentChat | Chat | Utility | Strategy based chat |
Here the detailed class definitions from the high-level pattern from the previous section are enumerated.
Also shown are entities defined as part of the ChatHistory optimization: IChatHistoryHandler, ChatHistoryKernelAgent, and ChatHistoryChannel.
These ChatHistory entities eliminates the requirement for Agents that act on a locally managed ChatHistory instance (as opposed to agents managed via remotely hosted frameworks) to implement their own AgentChannel.
| Class Name | Parent Class | Role | Modality | Note |
|---|---|---|---|---|
| Agent | - | Agent | Abstraction | Root agent abstraction |
| AgentChannel | - | Channel | Abstraction | Conduit for an agent's participation in an AgentChat. |
| KernelAgent | Agent | Agent | Abstraction | Defines Kernel services and plug-ins |
| ChatHistoryChannel | AgentChannel | Channel | Abstraction | Conduit for agent participation in a chat based on local chat-history. |
| IChatHistoryHandler | - | Agent | Abstraction | Defines a common part for agents that utilize ChatHistoryChannel. |
| ChatHistoryKernelAgent | KernelAgent | Agent | Abstraction | Common definition for any KernelAgent that utilizes a ChatHistoryChannel. |
| AgentChat | - | Chat | Abstraction | Provides core capabilities for an multi-turn agent conversation. |
The first concrete agent is ChatCompletionAgent.
The ChatCompletionAgent implementation is able to integrate with any IChatCompletionService implementation.
Since IChatCompletionService acts upon ChatHistory, this demonstrates how ChatHistoryKernelAgent may be simply implemented.
Agent behavior is (naturally) constrained according to the specific behavior of any IChatCompletionService.
For example, a connector that does not support function-calling will likewise not execute any KernelFunction as an Agent.
| Class Name | Parent Class | Role | Modality | Note |
|---|---|---|---|---|
| ChatCompletionAgent | ChatHistoryKernelAgent | Agent | SemanticKernel | Concrete Agent based on a local chat-history. |
AgentGroupChat is a concrete AgentChat whose behavior is defined by various Strategies.
| Class Name | Parent Class | Role | Modality | Note |
|---|---|---|---|---|
| AgentGroupChat | AgentChat | Chat | Utility | Strategy based chat |
| AgentGroupChatSettings | - | Config | Utility | Defines strategies that affect behavior of AgentGroupChat. |
| SelectionStrategy | - | Config | Utility | Determines the order for Agent instances to participate in AgentGroupChat. |
| TerminationStrategy | - | Config | Utility | Determines when the AgentGroupChat conversation is allowed to terminate (no need to select another Agent). |
The next concrete agent is OpenAIAssistantAgent.
This agent is based on the OpenAI Assistant API and implements its own channel as chat history is managed remotely as an assistant thread.
| Class Name | Parent Class | Role | Modality | Note |
|---|---|---|---|---|
| OpenAIAssistantAgent | KernelAgent | Agent | OpenAI Assistant | A functional agent based on OpenAI Assistant API |
| OpenAIAssistantChannel | AgentChannel | Channel | OpenAI Assistant | Channel associated with OpenAIAssistantAgent |
| OpenAIAssistantDefinition | - | Config | OpenAI Assistant | Definition of an Open AI Assistant provided when enumerating over hosted agent definitions. |
In order to support complex calling patterns, AggregatorAgent enables one or more agents participating in an AgentChat to present as a single logical Agent.
| Class Name | Parent Class | Role | Modality | Note |
|---|---|---|---|---|
| AggregatorAgent | Agent | Agent | Utility | Adapts an AgentChat as an Agent |
| AggregatorChannel | AgentChannel | Channel | Utility | AgentChannel used by AggregatorAgent. |
| AggregatorMode | - | Config | Utility | Defines the aggregation mode for AggregatorAgent. |
1. Agent Instantiation: ChatCompletion
Creating a ChatCompletionAgent aligns directly with how a Kernel object would be defined with an IChatCompletionService for outside of the Agent Framework,
with the addition of provide agent specific instructions and identity.
(dotnet)
// Start with the Kernel
IKernelBuilder builder = Kernel.CreateBuilder();
// Add any IChatCompletionService
builder.AddOpenAIChatCompletion(...);
// Include desired plugins / functions
builder.Plugins.Add(...);
// Include desired filters
builder.Filters.Add(...);
// Create the agent
ChatCompletionAgent agent =
new()
{
Instructions = "instructions",
Name = "name",
Kernel = builder.Build()
};
(python)
# Start with the Kernel
kernel = Kernel()
# Add any ChatCompletionClientBase
kernel.add_service(AzureChatCompletion(service_id="agent", ...))
# Include desired plugins / functions
kernel.add_plugin(...)
# Include desired filters (via @kernel.filter decorator)
# Create the agent
agent = ChatCompletionAgent(service_id="agent", kernel=kernel, name="name", instructions="instructions")
2. Agent Instantiation: OpenAI Assistant
Since every Assistant action is a call to a REST endpoint, OpenAIAssistantAgent, top-level operations are realized via static asynchronous factory methods:
Create:
(dotnet)
// Start with the Kernel
IKernelBuilder builder = Kernel.CreateBuilder();
// Include desired plugins / functions
builder.Plugins.Add(...);
// Create config and definition
OpenAIServiceConfiguration config = new("apikey", "endpoint");
OpenAIAssistantDefinition definition = new()
{
Instructions = "instructions",
Name = "name",
Model = "gpt-4",
};
// Create the agent
OpenAIAssistantAgent agent =
OpenAIAssistantAgent.CreateAsync(
builder.Build(),
config,
definition);
(python)
# Start with the Kernel
kernel = Kernel()
# Include desired plugins / functions
kernel.add_plugin(...)
# Create config and definition
config = OpenAIServiceConfiguration("apikey", "endpoint")
definition = OpenAIAssistantDefinition(instructions="instructions", name="name", model="gpt-4")
agent = OpenAIAssistantAgent.create(kernel=kernel, config=config, definition=definition)
Retrieval:
(dotnet)
// Start with the Kernel
Kernel kernel = ...;
// Create config
OpenAIServiceConfiguration config = new("apikey", "endpoint");
// Create the agent based on an existing definition
OpenAIAssistantAgent agent = OpenAIAssistantAgent.RetrieveAsync(kernel, config, "agent-id");
(python)
# Start with the Kernel
kernel = Kernel()
# Create config
config = OpenAIServiceConfiguration("apikey", "endpoint")
# Create the agent based on an existing definition
agent = OpenAIAssistantAgent.retrieve(kernel = kernel, config=config, agentid="agent-id")
Inspection:
(dotnet)
// Create config
OpenAIServiceConfiguration config = new("apikey", "endpoint");
// Enumerate defined agents
IAsyncEnumerable<OpenAIAssistantDefinition> definitions = OpenAIAssistantAgent.ListDefinitionsAsync(config);
(python)
# Create config
config = OpenAIServiceConfiguration("apikey", "endpoint")
# Enumerate defined agents
definitions = await OpenAIAssistantAgent.list_definitions(config=config)
3. Agent Chat: Explicit
An Agent may be explicitly targeted to respond in an AgentGroupChat.
(dotnet)
// Define agents
ChatCompletionAgent agent1 = ...;
OpenAIAssistantAgent agent2 = ...;
// Create chat
AgentGroupChat chat = new();
// Provide input for chat
ChatMessageContent input = new (AuthorRole.User, "input");
await WriteMessageAsync(input);
chat.AddChatMessage(input);
// First invoke one agent, then the other, display each response.
await WriteMessagesAsync(chat.InvokeAsync(agent1));
await WriteMessagesAsync(chat.InvokeAsync(agent2));
// The entire history may be accessed.
// Agent specific history is an adaptaton of the primary history.
await WriteMessagesAsync(chat.GetHistoryAsync());
await WriteMessagesAsync(chat.GetHistoryAsync(agent1));
await WriteMessagesAsync(chat.GetHistoryAsync(agent2));
(python)
# Define agents
agent1 = ChatCompletionAgent(...)
agent2 = OpenAIAssistantAgent.create(...)
# Create chat
chat = AgentGroupChat()
# Provide input for chat
input = ChatMessageContent(AuthorRole.User, "input")
await write_message(input)
chat.add_chat_message(input)
# First invoke one agent, then the other, display each response.
await write_message(chat.invoke(agent1))
await write_message(chat.invoke(agent2))
# The entire history may be accessed.
# Agent specific history is an adaptaton of the primary history.
await write_message(chat.get_history())
await write_message(chat.get_history(agent1))
await write_message(chat.get_history(agent2))
4. Agent Chat: Multi-Turn
Agents may also take multiple turns working towards an objective:
(dotnet)
// Define agents
ChatCompletionAgent agent1 = ...;
OpenAIAssistantAgent agent2 = ...;
ChatCompletionAgent agent3 = ...;
// Create chat with two agents.
AgentGroupChat chat =
new(agent1, agent2)
{
ExecutionSettings =
{
// Chat will continue until it meets the termination criteria.
TerminationionStrategy = new MyTerminationStrategy(),
}
};
// Provide input for chat
ChatMessageContent input = new(AuthorRole.User, "input");
await WriteMessageAsync(input);
chat.AddChatMessage(input);
// Agent may be added to an existing chat
chat.AddAgent(agent3);
// Execute the chat until termination
await WriteMessagesAsync(chat.InvokeAsync());
(python)
# Define agents
agent1 = ChatCompletionAgent(...)
agent2 = OpenAIAssistantAgent.create(...)
agent3 = ChatCompletionAgent(...)
// Create chat with two agents.
chat =
AgentGroupChat(agent1, agent2)
{
execution_settings =
{
# Chat will continue until it meets the termination criteria.
terminationion_strategy = MyTerminationStrategy(),
}
}
# Provide input for chat
input = ChatMessageContent(AuthorRole.User, "input")
await write_message(input)
chat.add_chat_message(input)
# Agent may be added to an existing chat
chat.add_agent(agent3)
# Execute the chat until termination
await write_message(chat.invoke())