docs/decisions/0025-planner-telemetry-enhancement.md
It would be extremely beneficial for applications using Semantic Kernel's planning features to be able to continuously monitor the performance of planners and plans as well as debugging them.
Contoso is a company that is developing an AI application using SK.
KernelFunctionFromPrompt).| ComponentType | ComponentName | Service ID | Model ID | Value |
|---|---|---|---|---|
| Function | WritePoem | GPT-3.5-Turbo | 40 | |
| Function | TellJoke | GPT-4 | 50 | |
| Function | WriteAndTellJoke | GPT-3.5-Turbo | 30 | |
| Planner | CreateHandlebarsPlan | GPT-3.5-Turbo | 100 |
KernelFunctionFromPrompt).| ComponentType | ComponentName | Service ID | Model ID | Value |
|---|---|---|---|---|
| Function | WritePoem | GPT-3.5-Turbo | 40 | |
| Function | TellJoke | GPT-4 | 50 | |
| Function | WriteAndTellJoke | GPT-3.5-Turbo | 30 | |
| Planner | CreateHandlebarsPlan | GPT-3.5-Turbo | 100 |
invoke call.| ComponentType | ComponentName | Value |
|---|---|---|
| Function | WritePoem | 1m |
| Function | TellJoke | 1m |
| Function | WriteAndTellJoke | 1.5m |
| Planner | CreateHandlebarsPlan | 2m |
| ComponentType | ComponentName | Fail | Success |
|---|---|---|---|
| Planner | CreateHandlebarsPlan | 5 | 95 |
| Planner | CreateHSequentialPlan | 20 | 80 |
| ComponentType | ComponentName | Fail | Success |
|---|---|---|---|
| Plan | HandlebarsPlan | 5 | 95 |
| Plan | SequentialPlan | 20 | 80 |
SK currently tracks token usage metrics in connectors; however, these metrics are not categorized. Consequently, developers cannot determine token usage for different operations. To address this issue, we propose the following two approaches:
We have decided to implement the bottom-up approach for the following reasons:
ContentBase. We simply need to extend the list of items that need to be propagated, such as model information.With the bottom-up approach, we need to retrieve the token usage information from the metadata:
// Note that not all services support usage details.
/// <summary>
/// Captures usage details, including token information.
/// </summary>
private void CaptureUsageDetails(string? modelId, IDictionary<string, object?>? metadata, ILogger logger)
{
if (string.IsNullOrWhiteSpace(modelId))
{
logger.LogWarning("No model ID provided to capture usage details.");
return;
}
if (metadata is null)
{
logger.LogWarning("No metadata provided to capture usage details.");
return;
}
if (!metadata.TryGetValue("Usage", out object? usageObject) || usageObject is null)
{
logger.LogWarning("No usage details provided to capture usage details.");
return;
}
var promptTokens = 0;
var completionTokens = 0;
try
{
var jsonObject = JsonElement.Parse(JsonSerializer.Serialize(usageObject));
promptTokens = jsonObject.GetProperty("PromptTokens").GetInt32();
completionTokens = jsonObject.GetProperty("CompletionTokens").GetInt32();
}
catch (Exception ex) when (ex is KeyNotFoundException)
{
logger.LogInformation("Usage details not found in model result.");
}
catch (Exception ex)
{
logger.LogError(ex, "Error while parsing usage details from model result.");
throw;
}
logger.LogInformation(
"Prompt tokens: {PromptTokens}. Completion tokens: {CompletionTokens}.",
promptTokens, completionTokens);
TagList tags = new() {
{ "semantic_kernel.function.name", this.Name },
{ "semantic_kernel.function.model_id", modelId }
};
s_invocationTokenUsagePrompt.Record(promptTokens, in tags);
s_invocationTokenUsageCompletion.Record(completionTokens, in tags);
}
Note that we do not consider services that do not return token usage. Currently only OpenAI & Azure OpenAI services return token usage information.
| Meter | Metrics |
|---|---|
| Microsoft.SemanticKernel.Planning | <ul><li>semantic_kernel.planning.invoke_plan.duration</li></ul> |
| Microsoft.SemanticKernel | <ul><li>semantic_kernel.function.invocation.token_usage.prompt</li><li>semantic_kernel.function.invocation.token_usage.completion</li></ul> |
Note: we are also replacing the "sk" prefixes with "semantic_kernel" for all existing metrics to avoid ambiguity.
Tests can be added to make sure that all the expected telemetry items are in place and of the correct format.
Function hooks allow developers to inject logic to the kernel that will be executed before or after a function is invoked. Example use cases include logging the function input before a function is invoked, and logging results after the function returns. For more information, please refer to the following ADRs:
We can inject, during function registration, default callbacks to log critical information for all functions.
Pros:
Cons:
Note: with distributed tracing already implemented in SK, developers can create custom telemetry within the hooks, which will be sent to the telemetry service once configured, as long as the information is available in the hooks. However, telemetry items created inside the hooks will not be correlated to the functions as parent-child relationships, since they are outside the scope of the functions.
Distributed tracing is a diagnostic technique that can localize failures and performance bottlenecks within distributed applications. .Net has native support to add distributed tracing in libraries and .Net libraries are also instrumented to produce distributed tracing information automatically.
For more information, please refer to this document: .Net distributed tracing
Overall pros:
Overall cons:
Logs will be used to record interesting events while the code is running.
// Use LoggerMessage attribute for optimal performance
this._logger.LogPlanCreationStarted();
this._logger.LogPlanCreated();
Metrics will be used to record measurements overtime.
/// <summary><see cref="Meter"/> for function-related metrics.</summary>
private static readonly Meter s_meter = new("Microsoft.SemanticKernel");
/// <summary><see cref="Histogram{T}"/> to record plan execution duration.</summary>
private static readonly Histogram<double> s_planExecutionDuration =
s_meter.CreateHistogram<double>(
name: "semantic_kernel.planning.invoke_plan.duration",
unit: "s",
description: "Duration time of plan execution.");
TagList tags = new() { { "semantic_kernel.plan.name", planName } };
try
{
...
}
catch (Exception ex)
{
// If a measurement is tagged with "error.type", then it's a failure.
tags.Add("error.type", ex.GetType().FullName);
}
s_planExecutionDuration.Record(duration.TotalSeconds, in tags);
Activities are used to track dependencies through an application, correlating work done by other components, and form a tree of activities known as a trace.
ActivitySource s_activitySource = new("Microsoft.SemanticKernel");
// Create and start an activity
using var activity = s_activitySource.StartActivity(this.Name);
// Use LoggerMessage attribute for optimal performance
logger.LoggerGoal(goal);
logger.LoggerPlan(plan);
Note: Trace log will contain sensitive data and should be turned off in production: https://learn.microsoft.com/en-us/dotnet/core/extensions/logging?tabs=command-line#log-level
using var traceProvider = Sdk.CreateTracerProviderBuilder()
.AddSource("Microsoft.SemanticKernel*")
.AddAzureMonitorTraceExporter(options => options.ConnectionString = connectionString)
.Build();
using var meterProvider = Sdk.CreateMeterProviderBuilder()
.AddMeter("Microsoft.SemanticKernel*")
.AddAzureMonitorMetricExporter(options => options.ConnectionString = connectionString)
.Build();
using var loggerFactory = LoggerFactory.Create(builder =>
{
// Add OpenTelemetry as a logging provider
builder.AddOpenTelemetry(options =>
{
options.AddAzureMonitorLogExporter(options => options.ConnectionString = connectionString);
// Format log messages. This is default to false.
options.IncludeFormattedMessage = true;
});
builder.SetMinimumLevel(MinLogLevel);
});
Additional works that need to be done: