OpenAI Agents

Use Confident AI for LLM observability and evals for OpenAI Agents

Overview

OpenAI Agents is a lightweight framework for creating agentic workflows using agent swarms, handoffs, and tool use. Confident AI also allows you to trace OpenAI Agent workflows with one line of code.

Tracing Quickstart

1

Install Dependencies

Run the following command to install the required packages:

$pip install -U deepeval openai-agents
2

Setup Confident AI Key

Login to Confident AI using your Confident API key.

$deepeval login
3

Configure OpenAI Agents

Add DeepEval’s trace processor to OpenAI Agents.

main.py
1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner
3
4add_trace_processor(DeepEvalTracingProcessor())
5
6agent = Agent(name="Assistant", instructions="You are a helpful assistant")
7result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Now whenever you use OpenAI Agents, DeepEval will collect OpenAI Agents traces and publish them to Confident AI.

4

Run OpenAI Agents

Invoke your agent by executing the script:

$python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Advanced Usage

Logging threads

Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. You can set the thread_id in as group_id in the trace context.

main.py
1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner, trace
3
4add_trace_processor(DeepEvalTracingProcessor())
5agent = Agent(name="Assistant", instructions="You are a helpful assistant")
6
7with trace(workflow_name="test_workflow_1", group_id="test_group_id_1"):
8 result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Logging metadata

You can also set the metadata in the trace context.

main.py
1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner, trace
3
4add_trace_processor(DeepEvalTracingProcessor())
5agent = Agent(name="Assistant", instructions="You are a helpful assistant")
6
7with trace(workflow_name="test_workflow_1", metadata={"test_metadata_1": "test_metadata_1"}):
8 result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Streaming responses

Confident AI handles both asynchronous workflows and streamed responses. The following example shows how to trace streamed responses with OpenAI Agents.

main.py
1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner
3import asyncio
4
5add_trace_processor(DeepEvalTracingProcessor())
6
7weather_agent = Agent(
8 name="Weather Agent",
9 instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
10)
11
12async def main():
13 result = Runner.run_streamed(weather_agent, "What's the weather in UK?")
14 async for chunk in result.stream_events():
15 print(chunk, end="", flush=True)
16
17asyncio.run(main())

Overwrite trace attributes

If you are using trace context to set the trace attributes, the trace attributes such as the input will be the input of the first agent run while the output will be the output of the last agent run.

If you want to set override the input, output, or any other attribute for the current trace, you can use the update_current_trace function:

main.py
1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner, trace
3from deepeval.tracing.context import update_current_trace
4
5add_trace_processor(DeepEvalTracingProcessor())
6agent = Agent(name="Assistant", instructions="You are a helpful assistant")
7
8with trace(workflow_name="test_workflow_1", metadata={"test_metadata_1": "test_metadata_1"}):
9 response_1 = Runner.run_sync(agent, "Hola, ¿cómo estás?")
10 update_current_trace(
11 name="New name",
12 input="New input",
13 output="New output",
14 metadata={"New key": "New value"}
15 )

The thread_id and metadata will ALSO override the trace context’s group_id and metadata respectively.

name
str

The name of the trace. Learn more.

tags
List[str]

Tags are string labels that help you group related traces. Learn more.

metadata
Dict

Attach any metadata to the trace. Learn more.

thread_id
str

Supply the thread or conversation ID to view and evaluate conversations. Learn more.

user_id
str

Supply the user ID to enable user analytics. Learn more.

Each attribute is optional, and works the same way as the native tracing features on Confident AI.

Logging prompts

If you are managing prompts[/docs/llm-evaluation/prompt-optimization/prompt-versioning] on Confident AI and wish to log them, pass your Prompt object to the DeepEval’s Agent wrapper.

main.py
1from agents import Runner, add_trace_processor
2from deepeval.prompt import Prompt
3from deepeval.openai_agents import DeepEvalTracingProcessor, Agent
4
5add_trace_processor(DeepEvalTracingProcessor())
6
7prompt = Prompt(alias="<prompt-alias>")
8prompt.pull(version="00.00.01")
9
10spanish_agent = Agent(
11 name="Spanish agent",
12 instructions=prompt.interpolate(),
13 confident_prompt=prompt,
14)
15Runner.run_sync(spanish_agent, "¿Cómo estás?")

Logging prompts lets you attribute specific prompts to OpenAI Agent LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.

Evals Usage

Online evals

You can run online evals on your OpenAI Agent, which will run evaluations on all incoming traces on Confident AI’s servers. This is the recommended approach, especially if your agent is in production.

1

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your OpenAI Agent.

Confident AI supports evaluating the input-output pairs of OpenAI Agent spans and traces, which means your metric collections must only contain metrics that only require the input and output for evaluation. These metrics include:

If you’re looking to use other metrics, setup Confident AI’s native tracing instead.

Create metric collection
2

Run evals

You can run evals at both the trace and span level. We recommend creating separate metric collections for each component, since each requires its own evaluation criteria and metrics.

Agent span metrics are currently only supported for Runner.run and Runner.run_sync.

Replace your Agent with DeepEval’s and supply the metric collection name to run evals on the agent span level.

main.py
1import asyncio
2from agents import Runner, add_trace_processor
3from deepeval.openai_agents import Agent, DeepEvalTracingProcessor
4
5add_trace_processor(DeepEvalTracingProcessor())
6
7weather_agent = Agent(
8 name="Weather Agent",
9 instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
10 agent_metric_collection="test_collection_1",
11)
12
13async def main():
14 result = await Runner.run(weather_agent, "What's the weather in UK?")
15 print(result.final_output)
16
17asyncio.run(main())

All incoming traces and spans will now be evaluated using metrics from your metric collection.