Latency, Cost, and Error Tracking
Overview
Confident AI lets you track the latency and cost of your LLM calls, which can help you identify inefficiencies in your LLM systems, such as high-cost models or heavy user usage. There are 2 types of cost tracking:
- Manual cost tracking: define the token count and per-token costs manually in code
- Automatic cost tracking: Confident AI infers the token count and per-token costs based on the
model
The @observe decorator automatically tracks
span latency. Therefore, this
guide will mainly focus on how to set up cost tracking.
Manual Cost Tracking
Automatic Cost Tracking
Setup Cost Tracking
You can either manually configure cost tracking or let Confident AI calculate your costs automatically from the inputs and outputs of your LLM span based on the provided model.
Automatic cost tracking is only available for OpenAI, Anthropic, and Gemini models.
If token-usage and cost data are provided in code, Confident AI computes the cost directly from those values. If not, it attempts to infer the cost from the model, input, and output by following these steps:
- Verify that the model, input, and output are all available and valid.
- Select the appropriate tokenizer for the model provider.
- Count the input and output tokens.
- Retrieve the per-token pricing from the provider.
- Compute the total cost.
Automatic Cost Tracking
To set up automatic cost tracking, provide the model in the @observe decorator of your LLM span, and provide the input and output in your LlmAttributes.
Python
TypeScript
The table below summarizes each available model provider and its corresponding tokenization method.
See the OpenAI documentation, Anthropic documentation, or Google documentation for the most up-to-date pricing and token counting information.
Manual Cost Tracking
To manually set up cost tracking, provide the cost_per_input_token and cost_per_output_token in the @observe decorator of your LLM span, and pass the input and output token counts via LlmAttributes.
Manual cost tracking is the recommended approach if you want to ensure accurate cost tracking, and if your model provider does not support automatic cost tracking.
Python
TypeScript
The total cost of this call will be computed as: