docs/gen-ai/Usage.md
This document shows how to use the causal language model API for text generation.
CausalLMPipeline provides the most vanilla way to generate text from a language model, which means the prompt will be fed into the model as is, without applying any chat template.
var pathToPhi3 = "path/to/phi3";
var tokenizer = LLama2Tokenizer.FromPretrained(pathToPhi3);
var phi3CausalModel = Phi3ForCausalLM.FromPretrained(pathToPhi3);
CausalLMPipeline<LLama2Tokenizer, Phi3ForCausalLM> pipeline = new CausalLMPipeline(tokenizer, phi3CausalModel);
var prompt = "<|user|>Once upon a time<|end|><assistant>";
var output = pipeline.Generate(
prompt: prompt,
maxLen: 100);
In most cases, developers would like to consume the model in a uniformed way. In this case, we can provide an extension method to semantic kernel which adds CausalLMPipeline as ChatCompletionService
var pathToPhi3 = "path/to/phi3";
var tokenizer = LLama2Tokenizer.FromPretrained(pathToPhi3);
var phi3CausalModel = Phi3ForCausalLM.FromPretrained(pathToPhi3);
CausalLMPipeline<LLama2Tokenizer, Phi3ForCausalLM> pipeline = new CausalLMPipeline(tokenizer, phi3CausalModel);
var kernel = Kernel.CreateBuilder()
// the type of the tokenizer and the model are explicitly specified
// here for clarity, but the compiler can infer them
// The typed pipeline prevent developers from passing an arbitrary CausalLMPipeline
// The reason why we don't want to allow developers to pass an arbitrary CausalLMPipeline is because
// - the model and the tokenizer must be compatible
// - the chat template must be compatible with the model. e.g. In `AddPhi3AsChatCompletionService`, the chat template is fixed to "<|user|>{prompt}<|end|><assistant>"
.AddPhi3AsChatCompletionService<LLama2Tokenizer, Phi3ForCausalLM>(pipeline)
.Build();
Similarly, developers would also like to consume the language model like agent.
var pathToPhi3 = "path/to/phi3";
var tokenizer = LLama2Tokenizer.FromPretrained(pathToPhi3);
var phi3CausalModel = Phi3ForCausalLM.FromPretrained(pathToPhi3);
var pipeline = new CausalLMPipeline(tokenizer, phi3CausalModel);
var agent = new Phi3MiniAgent(pipeline, name: "assistant");
var reply = await agent.SendAsync("Tell me a joke");
[!NOTE] This feature is very useful for evaluation and benchmarking. Because most of the benchmarking frameworks are implemented in python, but support consuming openai-like api. Therefore we can use this feature to evaluate the model using the same benchmarking framework as other models and get comparable results.
If the model is deployed as a service, developers can consume the model similar to OpenAI chat completion service.
// server.cs
var pathToPhi3 = "path/to/phi3";
var tokenizer = LLama2Tokenizer.FromPretrained(pathToPhi3);
var phi3CausalModel = Phi3ForCausalLM.FromPretrained(pathToPhi3);
var pipeline = new CausalLMPipeline(tokenizer, phi3CausalModel);
var agent = new Phi3MiniAgent(pipeline, name: "assistant");
// AutoGen.Net allows you to run the agent as an OpenAI chat completion endpoint
var host = Host.CreateDefaultBuilder()
.ConfigureWebHostDefaults(app =>
{
app.UseAgentAsOpenAIChatCompletionEndpoint(agent);
})
.Build();
await host.RunAsync();
On the client side, the consumption code will be no dfferent from consuming an openai chat completion service.