OpenAI Agents

Use Confident AI for LLM observability and evals for OpenAI Agents

Overview

OpenAI Agents is a lightweight framework for creating agentic workflows using agent swarms, handoffs, and tool use. Confident AI lets you trace OpenAI Agent workflows with one line of code.

Tracing Quickstart

1

Install Dependencies

Run the following command to install the required packages:

1pip install -U deepeval openai-agents
2

Setup Confident AI Key

Login to Confident AI using your Confident API key.

$export CONFIDENT_API_KEY="<your-confident-api-key>"
3

Configure OpenAI Agents

Add DeepEval’s trace processor to OpenAI Agents.

1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner
3
4add_trace_processor(DeepEvalTracingProcessor())
5
6agent = Agent(name="Assistant", instructions="You are a helpful assistant")
7result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Now whenever you use OpenAI Agents, DeepEval will collect OpenAI Agents traces and publish them to Confident AI.

4

Run OpenAI Agents

Invoke your agent by executing the script:

1python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

What Gets Traced

The DeepEvalTracingProcessor automatically captures the following span types from OpenAI Agents:

Span typeCaptured data
AgentAgent name, available tools, handoffs, input, output, output type
LLM ResponseModel, provider, input messages, output, token counts (input/output/cached/reasoning), invocation params
LLM GenerationModel, provider, input, output, token counts, model config
Function toolTool name, input parameters (parsed from JSON), output
MCP toolMCP server name, result
HandoffSource agent, destination agent
GuardrailGuardrail name, triggered status, guardrail type
CustomCustom span name and attached data

The LLM span provider is inferred automatically from the model name and normalized to the Confident AI platform format. For response span types, the following invocation parameters are also captured when present: temperature, top_p, max_output_tokens, tool_choice, tools, parallel_tool_calls, reasoning, text, and truncation.

Advanced Usage

Logging threads

Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here.

1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner, trace
3
4add_trace_processor(DeepEvalTracingProcessor())
5agent = Agent(name="Assistant", instructions="You are a helpful assistant")
6
7with trace(workflow_name="test_workflow_1", group_id="test_group_id_1"):
8 result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Logging metadata

You can attach arbitrary metadata to a trace using setTracingContext.

1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner, trace
3
4add_trace_processor(DeepEvalTracingProcessor())
5agent = Agent(name="Assistant", instructions="You are a helpful assistant")
6
7with trace(workflow_name="test_workflow_1", metadata={"test_metadata_1": "test_metadata_1"}):
8 result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Streaming responses

Confident AI handles both asynchronous workflows and streamed responses. The following example shows how to trace streamed responses with OpenAI Agents.

1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner
3import asyncio
4
5add_trace_processor(DeepEvalTracingProcessor())
6
7weather_agent = Agent(
8 name="Weather Agent",
9 instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
10)
11
12async def main():
13 result = Runner.run_streamed(weather_agent, "What's the weather in UK?")
14 async for chunk in result.stream_events():
15 print(chunk, end="", flush=True)
16
17asyncio.run(main())

Overwrite trace attributes

By default, the trace input is taken from the first agent span’s input and the output from the last agent span’s output. If you want to override the input, output, or any other trace attribute, use update_current_trace (Python) or updateCurrentTrace (TypeScript):

1from deepeval.openai_agents import DeepEvalTracingProcessor
2from agents import add_trace_processor, Agent, Runner, trace
3from deepeval.tracing.context import update_current_trace
4
5add_trace_processor(DeepEvalTracingProcessor())
6agent = Agent(name="Assistant", instructions="You are a helpful assistant")
7
8with trace(workflow_name="test_workflow_1", metadata={"test_metadata_1": "test_metadata_1"}):
9 response_1 = Runner.run_sync(agent, "Hola, ¿cómo estás?")
10 update_current_trace(
11 name="New name",
12 input="New input",
13 output="New output",
14 metadata={"New key": "New value"}
15 )
name
str

The name of the trace. Learn more.

tags
List[str]

Tags are string labels that help you group related traces. Learn more.

metadata
Dict

Attach any metadata to the trace. Learn more.

thread_id
str

Supply the thread or conversation ID to view and evaluate conversations. Learn more.

user_id
str

Supply the user ID to enable user analytics. Learn more.

Each attribute is optional, and works the same way as the native tracing features on Confident AI.

Logging prompts

If you are managing prompts on Confident AI and wish to log them, pass your Prompt object via the llmSpanContext in the setTracingContext call (TypeScript) or via the confident_prompt parameter on DeepEval’s Agent wrapper (Python).

1from agents import Runner, add_trace_processor
2from deepeval.prompt import Prompt
3from deepeval.openai_agents import DeepEvalTracingProcessor, Agent
4
5add_trace_processor(DeepEvalTracingProcessor())
6
7prompt = Prompt(alias="<prompt-alias>")
8prompt.pull(version="00.00.01")
9
10spanish_agent = Agent(
11 name="Spanish agent",
12 instructions=prompt.interpolate(),
13 confident_prompt=prompt,
14)
15Runner.run_sync(spanish_agent, "¿Cómo estás?")

Logging prompts lets you attribute specific prompts to OpenAI Agent LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.

Evals Usage

Online evals

You can run online evals on your OpenAI Agent, which will run evaluations on all incoming traces on Confident AI’s servers. This is the recommended approach, especially if your agent is in production.

1

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your OpenAI Agent.

Confident AI supports evaluating the input-output pairs of OpenAI Agent spans and traces, which means your metric collections must only contain metrics that only require the input and output for evaluation. These metrics include:

If you’re looking to use other metrics, setup Confident AI’s native tracing instead.

Create metric collection
2

Run evals

You can run evals at both the trace and span level. We recommend creating separate metric collections for each component, since each requires its own evaluation criteria and metrics.

Agent span metrics are currently only supported for Runner.run and Runner.run_sync.

Replace your Agent with DeepEval’s and supply the metric collection name to run evals on the agent span level.

1import asyncio
2from agents import Runner, add_trace_processor
3from deepeval.openai_agents import Agent, DeepEvalTracingProcessor
4
5add_trace_processor(DeepEvalTracingProcessor())
6
7weather_agent = Agent(
8 name="Weather Agent",
9 instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
10 agent_metric_collection="test_collection_1",
11)
12
13async def main():
14 result = await Runner.run(weather_agent, "What's the weather in UK?")
15 print(result.final_output)
16
17asyncio.run(main())

All incoming traces and spans will now be evaluated using metrics from your metric collection.