docs/decisions/0008-support-generic-llm-request-settings.md
The Semantic Kernel abstractions package includes a number of classes (CompleteRequestSettings, ChatRequestSettings, PromptTemplateConfig.CompletionConfig) which are used to support:
config.json associated with a Semantic FunctionThe problem with these classes is they include OpenAI specific properties only. A developer can only pass OpenAI specific requesting settings which means:
MaxTokens to Huggingfacedo_sample, typical_p, ...Link to issue raised by the implementer of the Oobabooga AI service: https://github.com/microsoft/semantic-kernel/issues/2735
config.json.dynamic to pass request settingsobject to pass request settingsNote: Using generics was discounted during an earlier investigation which Dmytro conducted.
Proposed: Define a base class for AI request settings which all implementations must extend.
dynamic to pass request settingsThe IChatCompletion interface would look like this:
public interface IChatCompletion : IAIService
{
ChatHistory CreateNewChat(string? instructions = null);
Task<IReadOnlyList<IChatResult>> GetChatCompletionsAsync(
ChatHistory chat,
dynamic? requestSettings = null,
CancellationToken cancellationToken = default);
IAsyncEnumerable<IChatStreamingResult> GetStreamingChatCompletionsAsync(
ChatHistory chat,
dynamic? requestSettings = null,
CancellationToken cancellationToken = default);
}
Developers would have the following options to specify the requesting settings for a semantic function:
// Option 1: Use an anonymous type
await kernel.InvokeSemanticFunctionAsync("Hello AI, what can you do for me?", requestSettings: new { MaxTokens = 256, Temperature = 0.7 });
// Option 2: Use an OpenAI specific class
await kernel.InvokeSemanticFunctionAsync(prompt, requestSettings: new OpenAIRequestSettings() { MaxTokens = 256, Temperature = 0.7 });
// Option 3: Load prompt template configuration from a JSON payload
string configPayload = @"{
""schema"": 1,
""description"": ""Say hello to an AI"",
""type"": ""completion"",
""completion"": {
""max_tokens"": 60,
""temperature"": 0.5,
""top_p"": 0.0,
""presence_penalty"": 0.0,
""frequency_penalty"": 0.0
}
}";
var templateConfig = JsonSerializer.Deserialize<PromptTemplateConfig>(configPayload);
var func = kernel.CreateSemanticFunction(prompt, config: templateConfig!, "HelloAI");
await kernel.RunAsync(func);
PR: https://github.com/microsoft/semantic-kernel/pull/2807
temperature or combine properties for different AI services e.g., max_tokens (OpenAI) and max_new_tokens (Oobabooga).RuntimeBinderException's and may be difficult to troubleshoot. Special care needs to be taken with return types e.g., may be necessary to specify an explicit type rather than just var again to avoid errors such as Microsoft.CSharp.RuntimeBinder.RuntimeBinderException : Cannot apply indexing with [] to an expression of type 'object'object to pass request settingsThe IChatCompletion interface would look like this:
public interface IChatCompletion : IAIService
{
ChatHistory CreateNewChat(string? instructions = null);
Task<IReadOnlyList<IChatResult>> GetChatCompletionsAsync(
ChatHistory chat,
object? requestSettings = null,
CancellationToken cancellationToken = default);
IAsyncEnumerable<IChatStreamingResult> GetStreamingChatCompletionsAsync(
ChatHistory chat,
object? requestSettings = null,
CancellationToken cancellationToken = default);
}
The calling pattern is the same as for the dynamic case i.e. use either an anonymous type, an AI service specific class e.g., OpenAIRequestSettings or load from JSON.
PR: https://github.com/microsoft/semantic-kernel/pull/2819
temperature or combine properties for different AI services e.g., max_tokens (OpenAI) and max_new_tokens (Oobabooga).dynamic case.The IChatCompletion interface would look like this:
public interface IChatCompletion : IAIService
{
ChatHistory CreateNewChat(string? instructions = null);
Task<IReadOnlyList<IChatResult>> GetChatCompletionsAsync(
ChatHistory chat,
AIRequestSettings? requestSettings = null,
CancellationToken cancellationToken = default);
IAsyncEnumerable<IChatStreamingResult> GetStreamingChatCompletionsAsync(
ChatHistory chat,
AIRequestSettings? requestSettings = null,
CancellationToken cancellationToken = default);
}
AIRequestSettings is defined as follows:
public class AIRequestSettings
{
/// <summary>
/// Service identifier.
/// </summary>
[JsonPropertyName("service_id")]
[JsonPropertyOrder(1)]
public string? ServiceId { get; set; } = null;
/// <summary>
/// Extra properties
/// </summary>
[JsonExtensionData]
public Dictionary<string, object>? ExtensionData { get; set; }
}
Developers would have the following options to specify the requesting settings for a semantic function:
// Option 1: Invoke the semantic function and pass an OpenAI specific instance
var result = await kernel.InvokeSemanticFunctionAsync(prompt, requestSettings: new OpenAIRequestSettings() { MaxTokens = 256, Temperature = 0.7 });
Console.WriteLine(result.Result);
// Option 2: Load prompt template configuration from a JSON payload
string configPayload = @"{
""schema"": 1,
""description"": ""Say hello to an AI"",
""type"": ""completion"",
""completion"": {
""max_tokens"": 60,
""temperature"": 0.5,
""top_p"": 0.0,
""presence_penalty"": 0.0,
""frequency_penalty"": 0.0
}
}";
var templateConfig = JsonSerializer.Deserialize<PromptTemplateConfig>(configPayload);
var func = kernel.CreateSemanticFunction(prompt, config: templateConfig!, "HelloAI");
await kernel.RunAsync(func);
It would also be possible to use the following pattern:
this._summarizeConversationFunction = kernel.CreateSemanticFunction(
SemanticFunctionConstants.SummarizeConversationDefinition,
skillName: nameof(ConversationSummarySkill),
description: "Given a section of a conversation, summarize conversation.",
requestSettings: new AIRequestSettings()
{
ExtensionData = new Dictionary<string, object>()
{
{ "Temperature", 0.1 },
{ "TopP", 0.5 },
{ "MaxTokens", MaxTokens }
}
});
The caveat with this pattern is, assuming a more specific implementation of AIRequestSettings uses JSON serialization/deserialization to hydrate an instance from the base AIRequestSettings, this will only work if all properties are supported by the default JsonConverter e.g.,
MyAIRequestSettings which includes a Uri property. The implementation of MyAIRequestSettings would make sure to load a URI converter so that it can serialize/deserialize the settings correctly.MyAIRequestSettings are sent to an AI service which relies on the default JsonConverter then a NotSupportedException exception will be thrown.PR: https://github.com/microsoft/semantic-kernel/pull/2829
ExtensionData can be used which allows a developer to pass in properties that may be supported by multiple AI services e.g., temperature or combine properties for different AI services e.g., max_tokens (OpenAI) and max_new_tokens (Oobabooga).