docs/en/setup/service-agent/virtual-genai.md
Virtual GenAI represents the Generative AI service nodes detected by server agents' plugins. The performance metrics of the GenAI operations are from the GenAI client-side perspective.
For example, a Spring AI plugin in the Java agent could detect the latency of a chat completion request. As a result, SkyWalking would show traffic, latency, success rate, token usage (input/output), and estimated cost in the GenAI dashboard.
Virtual GenAI metrics are derived from distributed tracing data. SkyWalking OAP can ingest and analyze trace data adhering to GenAI semantic conventions from the following sources:
The GenAI operation span should have the following properties:
gen_ai.provider.name, value = The Generative AI provider, e.g. openai, anthropic, ollamagen_ai.response.model, value = The name of the GenAI model, e.g. gpt-4o, claude-3-5-sonnetgen_ai.usage.input_tokens, value = The number of tokens used in the GenAI input (prompt)gen_ai.usage.output_tokens, value = The number of tokens used in the GenAI response (completion)gen_ai.server.time_to_first_token, value = The duration in milliseconds until the first token is received (streaming requests only)SkyWalking uses gen-ai-config.yml to map model names to providers and configure cost estimation.
When the gen_ai.provider.name tag is present in the span, it is used directly. Otherwise, SkyWalking matches the model name
against prefix-match rules to identify the provider. For example, a model name starting with gpt is mapped to openai.
To configure cost estimation, add models with pricing under the provider:
providers:
- provider: openai
prefix-match:
- gpt
models:
- name: gpt-4o
input-estimated-cost-per-m: 2.5 # estimated cost per 1,000,000 input tokens
output-estimated-cost-per-m: 10 # estimated cost per 1,000,000 output tokens
The following metrics are available at the provider (service) level:
gen_ai_provider_cpm - Calls per minutegen_ai_provider_sla - Success rategen_ai_provider_resp_time - Average response timegen_ai_provider_latency_percentile - Latency percentilesgen_ai_provider_input_tokens_sum / avg - Input token usagegen_ai_provider_output_tokens_sum / avg - Output token usagegen_ai_provider_total_estimated_cost / avg_estimated_cost - Estimated costThe following metrics are available at the model (service instance) level:
gen_ai_model_call_cpm - Calls per minutegen_ai_model_sla - Success rategen_ai_model_latency_avg / percentile - Latencygen_ai_model_ttft_avg / percentile - Time to first token (streaming only)gen_ai_model_input_tokens_sum / avg - Input token usagegen_ai_model_output_tokens_sum / avg - Output token usagegen_ai_model_total_estimated_cost / avg_estimated_cost - Estimated costSkyWalking Java Agent version >= 9.7
The tag keys used in Virtual GenAI follow the OpenTelemetry GenAI Semantic Conventions. SkyWalking OAP identifies GenAI-related spans based on the following criteria depending on the data source:
SpanLayer == GENAI and relevant gen_ai.* tags.gen_ai.response.model tag will be identified as a GenAI operation.Note on OTLP / Zipkin Provider Identification: To ensure broad compatibility with different OpenTelemetry instrumentation versions, SkyWalking OAP identifies the GenAI provider using the following prioritized logic:
gen_ai.provider.name: SkyWalking first looks for this tag (the latest OTel semantic convention).gen_ai.system: If the above is missing, it falls back to this legacy tag for backward compatibility with older instrumentation (e.g., current OTel Python auto-instrumentation).prefix-match rules defined in the gen-ai-config.yml.