OpenAI Agents
Overview
OpenAI Agents is a lightweight framework for creating agentic workflows using agent swarms, handoffs, and tool use. Confident AI lets you trace OpenAI Agent workflows with one line of code.
Tracing Quickstart
Configure OpenAI Agents
Add DeepEval’s trace processor to OpenAI Agents.
Now whenever you use OpenAI Agents, DeepEval will collect OpenAI Agents traces and publish them to Confident AI.
Run OpenAI Agents
Invoke your agent by executing the script:
You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.
What Gets Traced
The DeepEvalTracingProcessor automatically captures the following span types from OpenAI Agents:
The LLM span provider is inferred automatically from the model name and normalized to the Confident AI platform format. For response span types, the following invocation parameters are also captured when present: temperature, top_p, max_output_tokens, tool_choice, tools, parallel_tool_calls, reasoning, text, and truncation.
Advanced Usage
Logging threads
Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here.
Logging metadata
You can attach arbitrary metadata to a trace using setTracingContext.
Streaming responses
Confident AI handles both asynchronous workflows and streamed responses. The following example shows how to trace streamed responses with OpenAI Agents.
Overwrite trace attributes
By default, the trace input is taken from the first agent span’s input and the output from the last agent span’s output. If you want to override the input, output, or any other trace attribute, use update_current_trace (Python) or updateCurrentTrace (TypeScript):
View Trace Attributes
The name of the trace. Learn more.
Tags are string labels that help you group related traces. Learn more.
Attach any metadata to the trace. Learn more.
Supply the thread or conversation ID to view and evaluate conversations. Learn more.
Supply the user ID to enable user analytics. Learn more.
Each attribute is optional, and works the same way as the native tracing features on Confident AI.
Logging prompts
If you are managing prompts on Confident AI and wish to log them, pass your Prompt object via the llmSpanContext in the setTracingContext call (TypeScript) or via the confident_prompt parameter on DeepEval’s Agent wrapper (Python).
Logging prompts lets you attribute specific prompts to OpenAI Agent LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.
Evals Usage
Online evals
You can run online evals on your OpenAI Agent, which will run evaluations on all incoming traces on Confident AI’s servers. This is the recommended approach, especially if your agent is in production.
Create metric collection
Create a metric collection on Confident AI with the metrics you wish to use to evaluate your OpenAI Agent.
Click to see supported metrics for OpenAI Agents
Confident AI supports evaluating the input-output pairs of OpenAI Agent spans and traces, which means your metric collections must only contain metrics that only require the input and output for evaluation. These metrics include:
If you’re looking to use other metrics, setup Confident AI’s native tracing instead.
Run evals
You can run evals at both the trace and span level. We recommend creating separate metric collections for each component, since each requires its own evaluation criteria and metrics.
Agent span metrics are currently only supported for Runner.run and
Runner.run_sync.
Agent Span
LLM Span
Tool Span
Replace your Agent with DeepEval’s and supply the metric collection name to run evals on the agent span level.
All incoming traces and spans will now be evaluated using metrics from your metric collection.