Pydantic AI
Overview
Pydantic AI is a python-native LLM agent framework built on the foundations of pydantic validation. Confident AI allows you to trace and evaluate Pydantic AI agents in just a few lines of code.
Tracing Quickstart
Configure Pydantic AI
Use DeepEval’s ConfidentInstrumentationSettings to trace the LLM operations.
Synchronous
Asynchronous
Streaming
Run Pydantic AI
Invoke your agent by executing the script:
You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.
Advanced Usage
Logging prompts
If you are managing prompts on Confident AI and wish to log them, pass your Prompt object to the ConfidentInstrumentationSettings.
Logging prompts lets you attribute specific prompts to OpenAI Agent LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.
Logging threads
Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. Pass the thread_id to the ConfidentInstrumentationSettings.
Trace attributes
Other trace attributes can also be passed to the ConfidentInstrumentationSettings.
View Trace Attributes
The name of the trace. Learn more.
Tags are string labels that help you group related traces. Learn more.
Attach any metadata to the trace. Learn more.
Supply the thread or conversation ID to view and evaluate conversations. Learn more.
Supply the user ID to enable user analytics. Learn more.
Each attribute is optional, and works the same way as the native tracing features on Confident AI.
Sending annotations
Send human annotations on the threads or traces on Confident AI. Learn more about sending annotations.
Traces
Threads
Evals Usage
Online evals
You can run online evals on your Pydantic agent, which will run evaluations on all incoming traces on Confident AI’s servers. This approach is recommended if your agent is in production.
Create metric collection
Create a metric collection on Confident AI with the metrics you wish to use to evaluate your Pydantic agent.
Your metric collection must only contain metrics that only evaluate the input and actual output of your Pydantic AI agent.
Run evals
You can run evals at both the trace and span level. We recommend creating separate metric collections for each component, since each requires its own evaluation criteria and metrics. After instrumenting your Pydantic AI pass the metric collection name to the respective componens:
Trace
Agent Span
LLM Span
Tool Span
Pass the metric_collection parameter to the ConfidentInstrumentationSettings.
All incoming traces will now be evaluated using metrics from your metric collection.
End-to-end evals
Running end-to-end evals on your Pydantic agent evaluates your agent locally, and is the recommended approach if your agent is in a development or testing environment.
Similar to online evals, you can only run end-to-end evals on metrics that evaluate the input and actual output of your Pydantic agent.
Run evals
As shown in online evals, you can provide metrics to different components of the Agent, similar to the metric_collection. Then, use the dataset’s evals_iterator to invoke your Pydantic agent for each golden.
Asynchronous
This will automatically generate a test run with evaluated traces using inputs from your dataset.
You can view evals on Confident AI by clicking on the link in the output printed in the console.