Crew AI

Use Confident AI for LLM observability and evals for CrewAI

Overview

CrewAI is a lean, lightning-fast Python framework for creating autonomous AI agents tailored to any scenario. Confident AI allows you to trace and evaluate CrewAI workflows with just a single line of code.

Tracing Quickstart

1

Install Dependencies

Run the following command to install the required packages:

$pip install -U deepeval crewai
2

Configure CrewAI

Instrument CrewAI with your Confident AI API key using instrument_crewai.

main.py
1from crewai import Task, Crew, Agent
2
3from deepeval.integrations.crewai import instrument_crewai
4instrument_crewai()
5
6agent = Agent(
7 role="Consultant",
8 goal="Write clear, concise explanation.",
9 backstory="An expert consultant with a keen eye for software trends.",
10)
11
12task = Task(
13 description="Explain the given topic",
14 expected_output="A clear and concise explanation.",
15 agent=agent,
16)
17
18crew = Crew(agents=[agent], tasks=[task])
19
20result = crew.kickoff({"input": "What are the LLMs?"})
3

Run CrewAI

Kickoff your crew by executing the script:

$python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Advanced Usage

Logging threads

Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. Set the thread_id in the trace context and call crew.kickoff within the context.

main.py
1...
2with trace(thread_id="crewai_run_1"):
3 crew.kickoff({"city": "London"})

Logging metadata

You can also set the metadata in the trace context.

main.py
1...
2with trace(metadata={"test_metadata_1": "test_metadata_1"}):
3 crew.kickoff({"city": "London"})

Other trace attributes

Additionally, you can set the name, tags and user_id in the trace context.

main.py
1...
2with trace(name="crewai_run_1", tags=["crewai"], user_id="crewai_user_1"):
3 crew.kickoff({"city": "London"})
name
str

The name of the trace. Learn more.

tags
List[str]

Tags are string labels that help you group related traces. Learn more.

metadata
Dict

Attach any metadata to the trace. Learn more.

thread_id
str

Supply the thread or conversation ID to view and evaluate conversations. Learn more.

user_id
str

Supply the user ID to enable user analytics. Learn more.

Each attribute is optional, and works the same way as the native tracing features on Confident AI.

Evals Usage

Online evals

You can run online evals on your OpenAI Agent, which will run evaluations on all incoming traces on Confident AI’s servers. This is the recommended approach, especially if your agent is in production.

1

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your OpenAI Agent.

Confident AI supports evaluating the input-output pairs of OpenAI Agent spans and traces, which means your metric collections must only contain metrics that only require the input and output for evaluation. These metrics include:

If you’re looking to use other metrics, setup Confident AI’s native tracing instead.

Create metric collection
2

Run evals

Run evaluations on the various components of your CrewAI application by setting the metric_collection to the DeepEval’s wrapper for CrewAI.

The current CrewAI integration supports metrics with parameters that evaluate input and actual output in addition to the Task Completion metric.

To evaluate the trace level, set the trace_metric_collection to the DeepEval’s trace context.

main.py
1from crewai import Task, Crew, Agent
2
3from deepeval.tracing import trace
4from deepeval.integrations.crewai import instrument_crewai
5instrument_crewai()
6
7agent = Agent(
8 role="Consultant",
9 goal="Write clear, concise explanation.",
10 backstory="An expert consultant with a keen eye for software trends.",
11)
12
13task = Task(
14 description="Explain the given topic",
15 expected_output="A clear and concise explanation.",
16 agent=agent,
17)
18
19crew = Crew(agents=[agent], tasks=[task])
20
21with trace(trace_metric_collection="test_collection_1"):
22 result = crew.kickoff({"input": "What are the LLMs?"})

All incoming traces and spans will now be evaluated using metrics from your metric collection.