Threads

Group your traces as threads to evaluate an entire conversation workflow

Overview

A “thread” on Confident AI is a group of one or more traces. This is useful for those building AI chatrooms, conversational agents, etc., where you wish to view entire conversations on Confident AI.

It is traces that are grouped together, not spans.

Set Threads At Runtime

You can use the update_current_trace function to set the thread_id within traces, which Confident AI will use to group traces together:

main.py
1from deepeval.tracing import observe, update_current_trace
2from openai import OpenAI
3
4client = OpenAI()
5
6@observe()
7def llm_app(query: str):
8 res = client.chat.completions.create(
9 model="gpt-4o",
10 messages=[{"role": "user", "content": query}]
11 ).choices[0].message.content
12
13 update_current_trace(thread_id="your-thread-id", input=query, output=res)
14 return res
15
16llm_app("Write me a poem.")

The thread_id can be any string, and the input and output is optional and simply gives you more control on what is displayed on the UI.

If the I/O is not provided, it will be set to the default I/O values of the trace.

Inputs/Outputs

Note that although not strictly enforced, you should aim to make the input the user text input that is incoming to your multi-turn LLM app, and the output being the generated text output. Essentially, your trace should represent the observable system inputs and outputs of your application.

main.py
1from deepeval.tracing import observe, update_current_trace
2from openai import OpenAI
3
4client = OpenAI()
5
6@observe()
7def llm_app(query: str):
8 messages = {"role": "user", "content": query}
9 res = client.chat.completions.create(
10 model="gpt-4o",
11 messages=messages
12 ).choices[0].message[0]
13
14 # ✅ Do this, query is the raw user input
15 update_current_trace(thread_id="your-thread-id", input=query, output=res)
16
17 # ❌ Don't do this, messages is not the raw user input
18 # update_current_trace(thread_id="your-thread-id", input=messages, output=res)
19 return res

Also note that you don’t have to set inputs/outputs for a trace that does not contain an user input or LLM output. You can simply leave it blank, and Confident AI will format the turns accordingly on the UI and for evals.

example.py
1# ✅ You can set inputs and not set outputs
2update_current_trace(thread_id="your-thread-id", input=query)
3
4# ✅ You can set outputs and not set inputs
5update_current_trace(thread_id="your-thread-id", output=res)
6
7# ✅ You can omit setting both, given that there is at least one trace with input/output set for a thread
8update_current_trace(thread_id="your-thread-id")

Tools Called and Retrieval Context

You can also specify any tools that were called or retrieval context involved (for a RAG system) for any LLM generated text in a conversation (which in this case is the output on a trace).

main.py
1from deepeval.test_case import ToolCall
2...
3
4update_current_trace(
5 thread_id="your-thread-id",
6 output=res,
7 retrieval_context=["RAG context goes here."],
8 tools_called=[ToolCall(name="Websearch")]
9)

The turn context is complimentry to the output, and allows you to log any additional context involved in the generation of this turn.

Run Offline Evals on Threads

Use the evaluate_thread method to run offline evals on conversations once they’ve finished running:

main.py
1from deepeval.tracing import evaluate_thread
2
3evaluate_thread(thread_id="your-thread-id", metric_collection="Metric Collection")

You’ll need to create a multi-turn metric collection on Confident AI if you haven’t already to specify which metrics you invoke for a particular thread.

Under the hood, Confident AI takes all the inputs, outputs, and any turn context you’ve supplied to build a list of turns for a ConversationalTestCase. Confident AI will then use the multi-turn metrics found in your metric collection to run evals on the specified thread.

Conversational Test Case Architecture
Conversational Test Case Architecture