OpenAI Agents | Confident AI Docs

Overview

OpenAI Agents is a lightweight framework for creating agentic workflows using agent swarms, handoffs, and tool use. Confident AI also allows you to trace OpenAI Agent workflows with one line of code.

Tracing Quickstart

Install Dependencies

Run the following command to install the required packages:

$ pip install -U deepeval openai-agents

Setup Confident AI Key

$ deepeval login

Configure OpenAI Agents

Add DeepEval’s trace processor to OpenAI Agents.

main.py

1 from deepeval.openai_agents import DeepEvalTracingProcessor
2 from agents import add_trace_processor, Agent, Runner
3 
4 add_trace_processor(DeepEvalTracingProcessor())
5 
6 agent = Agent(name="Assistant", instructions="You are a helpful assistant")
7 result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Now whenever you use OpenAI Agents, DeepEval will collect OpenAI Agents traces and publish them to Confident AI.

Run OpenAI Agents

Invoke your agent by executing the script:

$ python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Advanced Usage

Logging threads

Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. You can set the thread_id in as group_id in the trace context.

main.py

1 from deepeval.openai_agents import DeepEvalTracingProcessor
2 from agents import add_trace_processor, Agent, Runner, trace
3 
4 add_trace_processor(DeepEvalTracingProcessor())
5 agent = Agent(name="Assistant", instructions="You are a helpful assistant")
6 
7 with trace(workflow_name="test_workflow_1", group_id="test_group_id_1"):
8       result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Logging metadata

You can also set the metadata in the trace context.

main.py

1 from deepeval.openai_agents import DeepEvalTracingProcessor
2 from agents import add_trace_processor, Agent, Runner, trace
3 
4 add_trace_processor(DeepEvalTracingProcessor())
5 agent = Agent(name="Assistant", instructions="You are a helpful assistant")
6 
7 with trace(workflow_name="test_workflow_1", metadata={"test_metadata_1": "test_metadata_1"}):
8       result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

Streaming responses

Confident AI handles both asynchronous workflows and streamed responses. The following example shows how to trace streamed responses with OpenAI Agents.

main.py

1 from deepeval.openai_agents import DeepEvalTracingProcessor
2 from agents import add_trace_processor, Agent, Runner
3 import asyncio
4 
5 add_trace_processor(DeepEvalTracingProcessor())
6 
7 weather_agent = Agent(
8     name="Weather Agent",
9     instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
10 )
11 
12 async def main():
13     result = Runner.run_streamed(weather_agent, "What's the weather in UK?")
14     async for chunk in result.stream_events():
15         print(chunk, end="", flush=True)
16 
17 asyncio.run(main())

Overwrite trace attributes

If you are using trace context to set the trace attributes, the trace attributes such as the input will be the input of the first agent run while the output will be the output of the last agent run.

If you want to set override the input, output, or any other attribute for the current trace, you can use the update_current_trace function:

main.py

1 from deepeval.openai_agents import DeepEvalTracingProcessor
2 from agents import add_trace_processor, Agent, Runner, trace
3 from deepeval.tracing.context import update_current_trace
4 
5 add_trace_processor(DeepEvalTracingProcessor())
6 agent = Agent(name="Assistant", instructions="You are a helpful assistant")
7 
8 with trace(workflow_name="test_workflow_1", metadata={"test_metadata_1": "test_metadata_1"}):
9     response_1 = Runner.run_sync(agent, "Hola, ¿cómo estás?")
10     update_current_trace(
11         name="New name",
12         input="New input",
13         output="New output",
14         metadata={"New key": "New value"}
15     )

The thread_id and metadata will ALSO override the trace context’s group_id and metadata respectively.

View Trace Attributes

name

str

The name of the trace. Learn more.

Logging prompts

If you are managing prompts[/docs/llm-evaluation/prompt-optimization/prompt-versioning] on Confident AI and wish to log them, pass your Prompt object to the DeepEval’s Agent wrapper.

main.py

1 from agents import Runner, add_trace_processor
2 from deepeval.prompt import Prompt
3 from deepeval.openai_agents import DeepEvalTracingProcessor, Agent
4 
5 add_trace_processor(DeepEvalTracingProcessor())
6 
7 prompt = Prompt(alias="<prompt-alias>")
8 prompt.pull(version="00.00.01")
9 
10 spanish_agent = Agent(
11     name="Spanish agent",
12     instructions=prompt.interpolate(),
13     confident_prompt=prompt,
14 )
15 Runner.run_sync(spanish_agent, "¿Cómo estás?")

Logging prompts lets you attribute specific prompts to OpenAI Agent LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.

Evals Usage

Online evals

You can run online evals on your OpenAI Agent, which will run evaluations on all incoming traces on Confident AI’s servers. This is the recommended approach, especially if your agent is in production.

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your OpenAI Agent.

Click to see supported metrics for OpenAI Agents

Confident AI supports evaluating the input-output pairs of OpenAI Agent spans and traces, which means your metric collections must only contain metrics that only require the input and output for evaluation. These metrics include:

If you’re looking to use other metrics, setup Confident AI’s native tracing instead.

Create metric collection

Run evals

You can run evals at both the trace and span level. We recommend creating separate metric collections for each component, since each requires its own evaluation criteria and metrics.

Agent span metrics are currently only supported for Runner.run and Runner.run_sync.

Agent Span

LLM Span

Tool Span

Replace your Agent with DeepEval’s and supply the metric collection name to run evals on the agent span level.

main.py

1 import asyncio
2 from agents import Runner, add_trace_processor
3 from deepeval.openai_agents import Agent, DeepEvalTracingProcessor
4 
5 add_trace_processor(DeepEvalTracingProcessor())
6 
7 weather_agent = Agent(
8     name="Weather Agent",
9     instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
10     agent_metric_collection="test_collection_1",
11 )
12 
13 async def main():
14     result = await Runner.run(weather_agent, "What's the weather in UK?")
15     print(result.final_output)
16 
17 asyncio.run(main())

All incoming traces and spans will now be evaluated using metrics from your metric collection.