Pydantic AI

Use Confident AI for LLM observability and evals for PydanticAI

Overview

Pydantic AI is a python-native LLM agent framework built on the foundations of pydantic validation. Confident AI allows you to trace and evaluate Pydantic AI agents in just a few lines of code.

Tracing Quickstart

1

Install Dependencies

Run the following command to install the required packages:

$pip install pydantic-ai sdk -U deepeval
2

Configure Pydantic AI

Use DeepEval’s ConfidentInstrumentationSettings to trace the LLM operations.

main.py
1from pydantic_ai import Agent
2from deepeval.integrations.pydantic_ai import ConfidentInstrumentationSettings
3
4agent = Agent(
5 "openai:gpt-4o-mini",
6 system_prompt="Be concise, reply with one sentence.",
7 instrument=ConfidentInstrumentationSettings(),
8 name="test_agent",
9)
10
11agent.run_sync("What are the LLMs?")
3

Run Pydantic AI

Invoke your agent by executing the script:

$python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Advanced Usage

Logging prompts

If you are managing prompts on Confident AI and wish to log them, pass your Prompt object to the ConfidentInstrumentationSettings.

main.py
1from pydantic_ai import Agent
2from deepeval.prompt import Prompt
3from deepeval.integrations.pydantic_ai.instrumentator import ConfidentInstrumentationSettings
4
5prompt = Prompt(alias="my-prompt")
6prompt.pull(version="00.00.01")
7
8system_prompt = prompt.interpolate()
9
10agent = Agent(
11 "openai:gpt-4o-mini",
12 system_prompt=system_prompt,
13 instrument=ConfidentInstrumentationSettings(
14 confident_prompt=prompt,
15 )
16)
17
18result = agent.run_sync("What are the LLMs?")

Logging prompts lets you attribute specific prompts to OpenAI Agent LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.

Logging threads

Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. Pass the thread_id to the ConfidentInstrumentationSettings.

1from pydantic_ai import Agent
2from deepeval.integrations.pydantic_ai import ConfidentInstrumentationSettings
3
4agent = Agent(
5 model="openai:gpt-4o-mini",
6 system_prompt="Be concise, reply with one sentence.",
7 instrument=ConfidentInstrumentationSettings(
8 thread_id="test_thread_id_1",
9 )
10)
11
12result = agent.run_sync("What are the LLMs?")

Trace attributes

Other trace attributes can also be passed to the ConfidentInstrumentationSettings.

1from pydantic_ai import Agent
2from deepeval.integrations.pydantic_ai import ConfidentInstrumentationSettings
3
4agent = Agent(
5 model="openai:gpt-4o-mini",
6 system_prompt="Be concise, reply with one sentence.",
7 instrument=ConfidentInstrumentationSettings(
8 thread_id="test_thread_id_1",
9 name="Name of Trace",
10 tags=["Tag 1", "Tag 2"],
11 metadata={"Key": "Value"},
12 user_id="user_1",
13 )
14)
name
str

The name of the trace. Learn more.

tags
List[str]

Tags are string labels that help you group related traces. Learn more.

metadata
Dict

Attach any metadata to the trace. Learn more.

thread_id
str

Supply the thread or conversation ID to view and evaluate conversations. Learn more.

user_id
str

Supply the user ID to enable user analytics. Learn more.

Each attribute is optional, and works the same way as the native tracing features on Confident AI.

Sending annotations

Send human annotations on the threads or traces on Confident AI. Learn more about sending annotations.

1from deepeval.tracing import trace
2from deepeval.annotation import send_annotation
3...
4
5TRACE_UUID = None
6with trace() as current_trace:
7 result = agent.run_sync("What are the LLMs?")
8 TRACE_UUID = current_trace.uuid # you can save this to use it later
9
10send_annotation(
11 trace_uuid=TRACE_UUID,
12 rating=1,
13)

Evals Usage

Online evals

You can run online evals on your Pydantic agent, which will run evaluations on all incoming traces on Confident AI’s servers. This approach is recommended if your agent is in production.

1

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your Pydantic agent.

Create metric collection

Your metric collection must only contain metrics that only evaluate the input and actual output of your Pydantic AI agent.

2

Run evals

You can run evals at both the trace and span level. We recommend creating separate metric collections for each component, since each requires its own evaluation criteria and metrics. After instrumenting your Pydantic AI pass the metric collection name to the respective componens:

Pass the metric_collection parameter to the ConfidentInstrumentationSettings.

main.py
1from pydantic_ai import Agent
2from deepeval.integrations.pydantic_ai import ConfidentInstrumentationSettings
3
4agent = Agent(
5 model="openai:gpt-4o-mini",
6 system_prompt="Be concise, reply with one sentence.",
7 instrument=ConfidentInstrumentationSettings(
8 trace_metric_collection="test_collection_1",
9 )
10)
11
12result = agent.run_sync("What are the LLMs?")

All incoming traces will now be evaluated using metrics from your metric collection.

End-to-end evals

Running end-to-end evals on your Pydantic agent evaluates your agent locally, and is the recommended approach if your agent is in a development or testing environment.

1

Create metric

1from deepeval.metrics import AnswerRelevancyMetric
2
3answer_relevancy = AnswerRelevancyMetric(
4 threshold=0.7,
5 model="gpt-4o-mini",
6 include_reason=True
7)

Similar to online evals, you can only run end-to-end evals on metrics that evaluate the input and actual output of your Pydantic agent.

3

Run evals

As shown in online evals, you can provide metrics to different components of the Agent, similar to the metric_collection. Then, use the dataset’s evals_iterator to invoke your Pydantic agent for each golden.

main.py
1import asyncio
2from deepeval.integrations.pydantic_ai import Agent
3from deepeval.metrics import AnswerRelevancyMetric
4from deepeval.dataset import EvaluationDataset, Golden
5
6agent = Agent("openai:gpt-4o-mini", system_prompt="Be concise, reply with one sentence.")
7
8answer_relavancy_metric = AnswerRelevancyMetric()
9dataset = EvaluationDataset(goldens=[Golden(input="What's 7 * 8?"), Golden(input="What's 7 * 6?")])
10
11for golden in dataset.evals_iterator():
12 task = asyncio.create_task(agent.run(golden.input, metrics=[answer_relavancy_metric]))
13 dataset.evaluate(task)

This will automatically generate a test run with evaluated traces using inputs from your dataset.

You can view evals on Confident AI by clicking on the link in the output printed in the console.