docs/docs/genai/tracing/token-usage-cost/index.mdx
import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; import useBaseUrl from "@docusaurus/useBaseUrl"; import TilesGrid from "@site/src/components/TilesGrid"; import TileCard from "@site/src/components/TileCard"; import { Coins, BarChart3, Calculator } from "lucide-react"; import TabsWrapper from "@site/src/components/TabsWrapper";
MLflow automatically tracks token usage and cost for LLM calls within your traces. This enables you to monitor resource consumption and optimize costs across your LLM applications and AI agents.
<video src={useBaseUrl("/images/llms/tracing/token-usage-and-cost-charts.mp4")} controls loop autoPlay muted aria-label="Token Usage and Cost charts in MLflow UI" />
When tracing is enabled, MLflow captures:
This information is available at both the span level (individual LLM calls) and aggregated at the trace level.
| Feature | MLflow Version |
|---|---|
| Token Usage Tracking | >= 3.2.0 |
| Cost Tracking | >= 3.10.0 |
To enable cost tracking, you need to start MLflow Tracking Server with the [genai] extra installed.
:::tip What is the GenAI extra?
The [genai] extra enables other powerful features such as <ins>AI Gateway</ins>, <ins>Automatic Evaluation</ins>, and <ins>Prompt Optimization</ins>. It is recommended to start your MLflow server with the extra installed, even if you don't need cost tracking.
:::
:::info
When using Databricks managed MLflow, the cost computation requires the client application to install <ins>LiteLLM</ins> or <ins>manually set the cost attributes on spans</ins>. This is not required for self-hosted MLflow.
:::
Token usage and cost information is displayed in the MLflow Trace UI.
<video src={useBaseUrl("/images/llms/tracing/token-and-cost-ui.mp4")} controls loop autoPlay muted aria-label="Token Usage and Cost in MLflow UI" />
The aggregated cost and trend charts are available in the Overview tab of the experiment page.
import mlflow
# Get the most recent trace
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)
# Access token usage
total_usage = trace.info.token_usage
if total_usage:
print("== Total token usage: ==")
print(f" Input tokens: {total_usage['input_tokens']}")
print(f" Output tokens: {total_usage['output_tokens']}")
print(f" Total tokens: {total_usage['total_tokens']}")
# Access cost (requires MLflow >= 3.10.0 and LiteLLM)
total_cost = trace.info.cost
if total_cost:
print("\n== Total cost (USD): ==")
print(f" Input cost: ${total_cost['input_cost']:.6f}")
print(f" Output cost: ${total_cost['output_cost']:.6f}")
print(f" Total cost: ${total_cost['total_cost']:.6f}")
import mlflow
# Get the most recent trace
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)
# Access token usage and cost for each LLM call
print("== Token usage and cost for each LLM call: ==")
for span in trace.data.spans:
usage = span.get_attribute("mlflow.chat.tokenUsage")
cost = span.llm_cost
print(f"{span.name}:")
if usage:
print(f" Input tokens: {usage['input_tokens']}")
print(f" Output tokens: {usage['output_tokens']}")
print(f" Total tokens: {usage['total_tokens']}")
if cost:
print(f" Input cost: ${cost['input_cost']:.6f}")
print(f" Output cost: ${cost['output_cost']:.6f}")
print(f" Total cost: ${cost['total_cost']:.6f}")
:::note Cost tracking support for TypeScript SDK will come soon. Token usage is currently available. :::
import * as mlflow from "mlflow-tracing";
// Flush any pending spans then fetch the most recent trace
await mlflow.flushTraces();
const lastTraceId = mlflow.getLastActiveTraceId();
if (lastTraceId) {
const client = new mlflow.MlflowClient({ trackingUri: "http://localhost:5000" });
const trace = await client.getTrace(lastTraceId);
// Access token usage
console.log("== Total token usage: ==");
console.log(trace.info.tokenUsage); // { input_tokens, output_tokens, total_tokens }
// Per-span usage
console.log("\n== Usage for each LLM call: ==");
for (const span of trace.data.spans) {
const usage = span.attributes?.["mlflow.chat.tokenUsage"];
if (usage) {
console.log(`${span.name}:`, usage);
}
}
}
== Total token usage: ==
Input tokens: 84
Output tokens: 22
Total tokens: 106
== Total cost (USD): ==
Input cost: $0.000013
Output cost: $0.000013
Total cost: $0.000026
== Token usage for each LLM call: ==
Completions_1:
Input tokens: 45
Output tokens: 14
Total tokens: 59
Completions_2:
Input tokens: 39
Output tokens: 8
Total tokens: 47
The token_usage field in the TraceInfo object returns a dictionary with the following keys:
| Key | Type | Description |
|---|---|---|
input_tokens | int | Number of tokens in the input/prompt |
output_tokens | int | Number of tokens in the output/completion |
total_tokens | int | Sum of input and output tokens |
The cost field returns a dictionary with the following keys:
| Key | Type | Description |
|---|---|---|
input_cost | float | Cost of input tokens in USD |
output_cost | float | Cost of output tokens in USD |
total_cost | float | Sum of input and output costs in USD |
Token usage and cost tracking is supported for most MLflow tracing integrations. See the individual integration pages for specific support details.
:::tip
Some providers or models may not return token usage information. In these cases, the token_usage and cost fields will be None.
:::
If automatic cost tracking is not available for your model or you want to override the calculated cost, you can manually set token and cost information on a span:
import mlflow
@mlflow.trace
def my_llm_call():
# Get the current span
span = mlflow.get_current_active_span()
# Your LLM call logic here...
# Manually set token usage
span.set_attribute(
"mlflow.chat.tokenUsage",
{
"input_tokens": 100,
"output_tokens": 50,
"total_tokens": 150,
},
)
# Manually set cost (in USD)
span.set_attribute(
"mlflow.llm.cost",
{
"input_cost": 0.0001,
"output_cost": 0.0002,
"total_cost": 0.0003,
},
)
return "response"