Crew AI
Overview
CrewAI is a lean, lightning-fast Python framework for creating autonomous AI agents tailored to any scenario. Confident AI allows you to trace and evaluate CrewAI workflows with just a single line of code.
Tracing Quickstart
Configure CrewAI
Instrument CrewAI with instrument_crewai before running any crew. You only need to call this once at startup.
instrument_crewai() monkey-patches Crew.kickoff, Crew.kickoff_for_each,
their async variants (kickoff_async, kickoff_for_each_async, akickoff,
akickoff_for_each), and Agent.execute_task / Agent.aexecute_task. It
also registers an event listener that captures LLM calls, tool usage, and
knowledge retrieval automatically.
You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.
What gets traced
After calling instrument_crewai(), every crew run produces a nested trace with the following span types:
Advanced Usage
Logging threads
Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. Set the thread_id in the trace context and call crew.kickoff within the context.
Logging metadata
You can also set the metadata in the trace context.
Other trace attributes
Additionally, you can set the name, tags and user_id in the trace context.
View Trace Attributes
The name of the trace. Learn more.
Tags are string labels that help you group related traces. Learn more.
Attach any metadata to the trace. Learn more.
Supply the thread or conversation ID to view and evaluate conversations. Learn more.
Supply the user ID to enable user analytics. Learn more.
Each attribute is optional, and works the same way as the native tracing features on Confident AI.
Evals Usage
Online evals
You can run online evals on your CrewAI application, which will run evaluations on all incoming traces on Confident AI’s servers. This is the recommended approach, especially if your agent is in production.
Create metric collection
Create a metric collection on Confident AI with the metrics you wish to use to evaluate your CrewAI application.
Click to see supported metrics for CrewAI
Confident AI supports evaluating the input-output pairs of CrewAI spans and traces, which means your metric collections must only contain metrics that only require the input and output for evaluation. These metrics include:
If you’re looking to use other metrics, setup Confident AI’s native tracing instead.
Run evals
Run evaluations on the various components of your CrewAI application by setting the metric_collection on DeepEval’s wrappers for Crew, Agent, LLM, or tool.
The current CrewAI integration supports metrics with parameters that evaluate input and actual output in addition to the Task Completion metric.
Trace
Crew Span
Agent Span
LLM Span
Tool Span
To evaluate at the trace level, pass metric_collection to the trace context.
All incoming traces and spans will now be evaluated using metrics from your metric collection.