Track LLM Costs | Confident AI Docs

Overview

Confident AI tracks the token usage and cost of your LLM calls, helping you identify high-cost models and heavy usage patterns across your application.

Cost tracking only applies to LLM spans. If you haven’t already, learn how to configure span types first.

LLM Cost Tracking

How It Works

Confident AI resolves token usage and cost for each LLM span in the following order of precedence, and separately for both input and output tokens:

Per-token costs and counts set in Evals API/DeepEval (via observe or update_llm_span / updateLlmSpan) take the highest priority and will always override any other source.
- Integrations may provide token count but cost calculation happens the same way.
Custom set model costs — if you provide token counts but not per-token costs, Confident AI will use the pricing you’ve configured in your Model Costs settings to calculate the cost.
Automatic inference — if neither per-token costs nor project-level costs are available, Confident AI tokenizes the span’s input/output text using a provider-specific tokenizer and internally looks up pricing based on the model.

Once token counts and per-token costs are resolved for each side, the total cost is computed as:

input_cost  = input_token_count  × cost_per_input_token   (if both are non-null)
output_cost = output_token_count × cost_per_output_token   (if both are non-null)
total_cost  = input_cost + output_cost

Input and output costs are computed independently — cost will only be logged for a side (input or output) if its values are non-null (they do not default to 0).

Automatic inference is only available for OpenAI, Anthropic, and Gemini models. For all other providers, supply token counts and costs manually or configure Model Costs in your project settings.

Track Token Usage Count

You can manually set the input and output token counts on an LLM span using update_llm_span / updateLlmSpan. This is useful when your provider returns token usage in the response and you want to log it precisely.

Python

TypeScript

main.py

1 from deepeval.tracing import observe, update_llm_span
2 
3 @observe(type="llm", model="gpt-4o")
4 def generate_response(prompt: str) -> str:
5     response = call_llm(prompt)
6     update_llm_span(
7         input_token_count=response.usage.prompt_tokens,
8         output_token_count=response.usage.completion_tokens,
9     )
10     return response.text

If you don’t provide token counts and aren’t using an integration, Confident AI will attempt to infer them by tokenizing the span’s input and output text using the appropriate provider tokenizer. The table below summarizes each supported provider and its tokenization method.

Provider	Tokenizer	Example Models	Token Counting Method
OpenAI	tiktoken	`gpt-4o`, `gpt-4.1`, `o1`, `o3`	Client-side tokenization using model-specific encodings
Anthropic	@anthropic-ai/tokenizer	`claude-3.5-sonnet`, `claude-3.7-sonnet`, `claude-4`	Claude-specific tokenization algorithm
Google	Gemini API	`gemini-2.0-flash`, `gemini-2.5-pro`	Server-side token counting via API call

See the OpenAI documentation, Anthropic documentation, or Google documentation for the most up-to-date pricing.

Note that the input and output are calculated separately - you don’t have to provide both to set the cost for either.

Track Token Usage Cost

Once token counts are available (either set manually, captured by an integration, or inferred automatically), Confident AI resolves the per-token cost using the following precedence:

Per-token costs set in DeepEval/Evals API — if you provide cost per input/output tokens directly in code, these always take priority.
Custom set model costs — if per-token costs aren’t set in code, Confident AI uses the pricing configured in your Model Costs settings.
Automatic price lookup — if no project-level costs are configured, Confident AI looks up the per-token pricing internally based on the model. This is only available for OpenAI, Anthropic, and Gemini models.

If none of the above resolve a per-token cost, the cost for that side (input or output) is not logged.

Explicit cost setting

Set the per-token costs explicitly in the observe decorator/wrapper alongside your token counts. This spares you for provider models not supported by automatic price lookup.

Explicit cost setting is best for teams that want programtic control over cost. For teams wanting to set model costs on the platform directory, see custom price lookup.

Python

TypeScript

main.py

1 from deepeval.tracing import observe, update_llm_span
2 
3 @observe(
4     type="llm",
5     model="gpt-4o",
6     cost_per_input_token=0.001,
7     cost_per_output_token=0.001
8 )
9 def generate_response(prompt: str) -> str:
10     output = call_llm(prompt)
11     update_llm_span(
12         input_token_count=10,
13         output_token_count=20,
14     )
15     return output

Custom price lookup

If you provide token counts but don’t set per-token costs in code, Confident AI will use the pricing you’ve configured in your project’s Model Costs settings. This is useful when you want to manage pricing centrally without changing any code.

Model costs are matched against the model name on your LLM span using wildcard patterns. For example:

gpt-4o — matches only gpt-4o
gpt-4* — matches gpt-4o, gpt-4o-mini, gpt-4-turbo, etc.
claude-* — matches all Claude model variants

You can optionally restrict a cost rule to a specific provider, and set input and output costs independently per million tokens. See the full Model Costs settings page for setup instructions.

Automatic price lookup

If you provide a supported model on your LLM span and neither SDK-level nor project-level costs are configured, Confident AI will automatically look up the per-token pricing and calculate the cost — no additional code needed.

Python

TypeScript

main.py

1 from deepeval.tracing import observe, update_current_span
2 
3 @observe(type="llm", model="gpt-4o")
4 def generate_response(prompt: str) -> str:
5     output = call_llm(prompt)
6     update_current_span(input=prompt, output=output)
7     return output

Automatic price lookup is only available for OpenAI, Anthropic, and Gemini models. For all other providers, set per-token costs manually or configure Model Costs in your project settings.

Cost on Traces

Cost on traces are automatically set by summing up the cost of all LLM spans in said trace. Similar to LLM spans, trace cost defaults to null values if no LLM spans have non-null values.

Next Steps

With cost tracking configured, continue setting up the rest of your instrumentation.

Set Input/Output

Override the default input and output on traces and spans for better visualization and evaluation.

Thread Traces

Group traces into threads to track multi-turn conversations and evaluate entire workflows.