OpenAI Agents
Overview
OpenAI Agents is a lightweight framework for creating agentic workflows using agent swarms, handoffs, and tool use. Confident AI also allows you to trace OpenAI Agent workflows with one line of code.
Tracing Quickstart
Configure OpenAI Agents
Add DeepEval’s trace processor to OpenAI Agents.
Now whenever you use OpenAI Agents, DeepEval will collect OpenAI Agents traces and publish them to Confident AI.
Run OpenAI Agents
Invoke your agent by executing the script:
You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.
Advanced Usage
Logging threads
Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. You can set the thread_id in as group_id in the trace context.
Logging metadata
You can also set the metadata in the trace context.
Streaming responses
Confident AI handles both asynchronous workflows and streamed responses. The following example shows how to trace streamed responses with OpenAI Agents.
Overwrite trace attributes
If you are using trace context to set the trace attributes, the trace attributes such as the input will be the input of the first agent run while the output will be the output of the last agent run.
If you want to set override the input, output, or any other attribute for the current trace, you can use the update_current_trace function:
The thread_id and metadata will ALSO override the trace context’s
group_id and metadata respectively.
View Trace Attributes
The name of the trace. Learn more.
Tags are string labels that help you group related traces. Learn more.
Attach any metadata to the trace. Learn more.
Supply the thread or conversation ID to view and evaluate conversations. Learn more.
Supply the user ID to enable user analytics. Learn more.
Each attribute is optional, and works the same way as the native tracing features on Confident AI.
Logging prompts
If you are managing prompts[/docs/llm-evaluation/prompt-optimization/prompt-versioning] on Confident AI and wish to log them, pass your Prompt object to the DeepEval’s Agent wrapper.
Logging prompts lets you attribute specific prompts to OpenAI Agent LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.
Evals Usage
Online evals
You can run online evals on your OpenAI Agent, which will run evaluations on all incoming traces on Confident AI’s servers. This is the recommended approach, especially if your agent is in production.
Create metric collection
Create a metric collection on Confident AI with the metrics you wish to use to evaluate your OpenAI Agent.
Click to see supported metrics for OpenAI Agents
Confident AI supports evaluating the input-output pairs of OpenAI Agent spans and traces, which means your metric collections must only contain metrics that only require the input and output for evaluation. These metrics include:
If you’re looking to use other metrics, setup Confident AI’s native tracing instead.
Run evals
You can run evals at both the trace and span level. We recommend creating separate metric collections for each component, since each requires its own evaluation criteria and metrics.
Agent span metrics are currently only supported for Runner.run and
Runner.run_sync.
Agent Span
LLM Span
Tool Span
Replace your Agent with DeepEval’s and supply the metric collection name to run evals on the agent span level.
All incoming traces and spans will now be evaluated using metrics from your metric collection.