Pydantic AI

Use Confident AI for LLM observability and evals for PydanticAI

Overview

Pydantic AI is a Python-native LLM agent framework built on the foundations of Pydantic validation. Confident AI allows you to trace and evaluate Pydantic AI agents using an OpenTelemetry-based integration.

Tracing Quickstart

For users in the EU region, please set the OTEL endpoint to the EU version:

$export CONFIDENT_OTEL_URL="https://eu.otel.confident-ai.com"
1

Install Dependencies

Run the following command to install the required packages:

$pip install -U deepeval pydantic-ai
2

Setup Confident AI Key

Login to Confident AI using your Confident API key.

$export CONFIDENT_API_KEY="<your-confident-api-key>"
3

Configure Pydantic AI

Pass DeepEvalInstrumentationSettings to your agent’s instrument parameter. This sets up the full OpenTelemetry pipeline — including span classification, trace context wiring, and export to Confident AI — in a single step.

main.py
1from pydantic_ai import Agent
2from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
3
4agent = Agent(
5 "openai:gpt-4o-mini",
6 system_prompt="Be concise, reply with one sentence.",
7 name="my_agent",
8 instrument=DeepEvalInstrumentationSettings(),
9)
10
11result = agent.run_sync("What are LLMs?")
12print(result.output)

DeepEvalInstrumentationSettings constructs a TracerProvider, registers the span processor pipeline, sets the global OTel tracer provider, and forwards itself to pydantic-ai’s Agent(instrument=...). The Confident AI API key is read automatically from the CONFIDENT_API_KEY environment variable or from deepeval login — you only need to pass api_key= explicitly if you manage keys programmatically.

4

Run Pydantic AI

Invoke your agent by executing the script:

$python main.py

You can view the traces on Confident AI by clicking on the link printed in the console.

Advanced Usage

Logging threads

Threads group related traces together and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. Pass thread_id to DeepEvalInstrumentationSettings to associate every trace from that agent with a thread.

1from pydantic_ai import Agent
2from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
3
4agent = Agent(
5 model="openai:gpt-4o-mini",
6 system_prompt="Be concise, reply with one sentence.",
7 instrument=DeepEvalInstrumentationSettings(
8 thread_id="thread_id_1",
9 ),
10)
11
12result = agent.run_sync("What are LLMs?")

Trace attributes

You can attach trace-level attributes such as name, tags, metadata, and user ID to every trace produced by the agent. These are baked into DeepEvalInstrumentationSettings as static defaults. They can be overridden at runtime using update_current_trace(...) from inside a tool body.

1from pydantic_ai import Agent
2from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
3
4agent = Agent(
5 model="openai:gpt-4o-mini",
6 system_prompt="Be concise, reply with one sentence.",
7 instrument=DeepEvalInstrumentationSettings(
8 name="My Agent Trace",
9 tags=["production", "v2"],
10 metadata={"env": "production"},
11 user_id="user_123",
12 thread_id="thread_id_1",
13 ),
14)
15
16result = agent.run_sync("What are LLMs?")
api_key
str

Your Confident AI API key. Falls back to the CONFIDENT_API_KEY environment variable or deepeval login.

name
str

The default name for traces produced by this agent. Learn more.

tags
List[str]

String labels that help you group related traces. Learn more.

metadata
Dict

Arbitrary metadata attached to each trace. At runtime, update_current_trace(metadata=...) is merged on top of this base. Learn more.

thread_id
str

Conversation or session ID for grouping multi-turn traces. Learn more.

user_id
str

User identifier for user-level analytics. Learn more.

metric_collection
str

Name of the metric collection to run online evals against each trace.

test_case_id
str

Associates a trace with a specific test case.

turn_id
str

Identifies a specific turn within a multi-turn conversation.

All attributes are optional. They work the same way as the native tracing features on Confident AI. Any field set here is overridable at runtime via update_current_trace(...) from inside a tool body.

Update trace attributes

You can enrich a trace mid-flight from inside a tool body using update_current_trace. This is useful when trace metadata depends on information only available during execution, such as a user ID resolved by a lookup tool.

main.py
1from pydantic_ai import Agent
2from deepeval.tracing import update_current_trace
3from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
4
5agent = Agent(
6 "openai:gpt-4o-mini",
7 instrument=DeepEvalInstrumentationSettings(),
8)
9
10@agent.tool_plain
11def lookup_user(user_id: str) -> str:
12 # Enrich the trace with data resolved at runtime
13 update_current_trace(
14 user_id=user_id,
15 metadata={"plan": "pro", "region": "us-east"},
16 )
17 return f"User {user_id} profile loaded."
18
19result = agent.run_sync("Load my profile for user_42.")

update_current_trace(...) is safe to call from any tool body, including async tools and tools running in worker threads. The implicit trace context created by DeepEvalInstrumentationSettings is propagated automatically via Python contextvars.

Update span attributes

You can attach span-level attributes such as metadata or a metric collection from inside a tool body using update_current_span. This is the primary way to configure per-tool evaluation behavior.

main.py
1from pydantic_ai import Agent
2from deepeval.tracing import update_current_span
3from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
4
5agent = Agent(
6 "openai:gpt-4o-mini",
7 instrument=DeepEvalInstrumentationSettings(),
8)
9
10@agent.tool_plain
11def get_weather(city: str) -> str:
12 update_current_span(
13 metadata={"city": city, "source": "mock"},
14 metric_collection="weather-tool-evals",
15 )
16 return f"{city}: sunny, 22°C"
17
18result = agent.run_sync("What is the weather in Tokyo?")

Logging prompts

If you are managing prompts on Confident AI and wish to log them, use next_llm_span to associate a Prompt with the next LLM span before calling your agent.

main.py
1from pydantic_ai import Agent
2from deepeval.prompt import Prompt
3from deepeval.tracing import next_llm_span
4from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
5
6agent = Agent(
7 "openai:gpt-4o-mini",
8 instrument=DeepEvalInstrumentationSettings(),
9)
10
11prompt = Prompt(alias="<prompt-alias>")
12prompt.pull(version="00.00.01")
13
14with next_llm_span(prompt=prompt):
15 result = agent.run_sync(prompt.interpolate())

Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI. next_llm_span is one-shot — it is consumed by the next LLM span produced inside the with block.

Per-call trace context

To set per-call trace attributes (such as a different user_id per request), wrap each agent invocation in with trace(...). This also switches routing to Confident AI’s REST transport.

main.py
1from pydantic_ai import Agent
2from deepeval.tracing import trace
3from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
4
5agent = Agent(
6 "openai:gpt-4o-mini",
7 instrument=DeepEvalInstrumentationSettings(),
8)
9
10with trace(user_id="user_42", thread_id="thread_1", name="my-trace"):
11 result = agent.run_sync("What are LLMs?")

Sending annotations

Send human annotations on traces or threads on Confident AI. Learn more about sending annotations.

1from deepeval.tracing import trace
2from deepeval.annotation import send_annotation
3from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
4
5agent = Agent(
6 "openai:gpt-4o-mini",
7 instrument=DeepEvalInstrumentationSettings(),
8)
9
10TRACE_UUID = None
11with trace() as current_trace:
12 result = agent.run_sync("What are LLMs?")
13 TRACE_UUID = current_trace.uuid
14
15send_annotation(
16 trace_uuid=TRACE_UUID,
17 rating=1,
18)

Evals Usage

Online evals

You can run online evals on your Pydantic AI agent. Online evals run evaluations on all incoming traces on Confident AI’s servers and are the recommended approach for production agents.

1

Create metric collection

Create a metric collection on Confident AI with the metrics you want to use to evaluate your agent.

Create metric collection

Your metric collection should only contain metrics that evaluate the input and output of the span or trace you are targeting.

2

Run evals

You can run online evals at the trace level or the span level. Pass the metric_collection parameter to the appropriate target.

Pass metric_collection to DeepEvalInstrumentationSettings to evaluate every trace produced by the agent.

main.py
1from pydantic_ai import Agent
2from deepeval.integrations.pydantic_ai import DeepEvalInstrumentationSettings
3
4agent = Agent(
5 model="openai:gpt-4o-mini",
6 system_prompt="Be concise, reply with one sentence.",
7 instrument=DeepEvalInstrumentationSettings(
8 metric_collection="my_trace_collection",
9 ),
10)
11
12result = agent.run_sync("What are LLMs?")

All incoming traces and spans will now be evaluated using metrics from your metric collection.

You can view eval results on Confident AI by clicking on the link printed in the console.