Thread Traces

Group your traces as threads to evaluate an entire conversation workflow

Overview

A “thread” on Confident AI is a group of one or more traces linked by a shared thread ID. This is useful for building conversational AI apps — chatbots, multi-turn agents, etc. — where you want to view and evaluate an entire conversation as a single unit.

Each call to your app creates a trace, and traces with the same thread ID are grouped together chronologically as turns in a conversation.

Threads group traces together, not spans. Each trace represents one turn in the conversation.

Create a Thread

To create a thread, set a thread_id on your traces using update_current_trace / updateCurrentTrace. Any traces that share the same thread ID will be grouped into a single thread.

main.py
1from deepeval.tracing import observe, update_current_trace
2from openai import OpenAI
3
4client = OpenAI()
5
6@observe()
7def llm_app(query: str):
8 res = client.chat.completions.create(
9 model="gpt-4o",
10 messages=[{"role": "user", "content": query}]
11 ).choices[0].message.content
12
13 update_current_trace(thread_id="your-thread-id", input=query, output=res)
14 return res
15
16llm_app("What's the weather in SF?")
17llm_app("What about tomorrow?")

The thread_id / threadId can be any string — typically a session ID or conversation ID from your app.

Set Thread I/O

Although not strictly enforced, you should set the input to the raw user text and the output to the generated LLM text for each trace. These are used as the conversation turns for display on Confident AI and for thread evaluations.

main.py
1from deepeval.tracing import observe, update_current_trace
2from openai import OpenAI
3
4client = OpenAI()
5
6@observe()
7def llm_app(query: str):
8 messages = {"role": "user", "content": query}
9 res = client.chat.completions.create(
10 model="gpt-4o",
11 messages=messages
12 ).choices[0].message.content
13
14 # ✅ Do this — query is the raw user input
15 update_current_trace(thread_id="your-thread-id", input=query, output=res)
16
17 # ❌ Don't do this — messages is not the raw user input
18 # update_current_trace(thread_id="your-thread-id", input=messages, output=res)
19 return res

You don’t have to set both input and output on every trace. If a turn only has a user input or only an LLM output, you can set just one. Confident AI will format the turns accordingly on the UI and for evals.

example.py
1# ✅ Set only input (e.g. user message with no immediate LLM response)
2update_current_trace(thread_id="your-thread-id", input=query)
3
4# ✅ Set only output (e.g. proactive LLM message with no user input)
5update_current_trace(thread_id="your-thread-id", output=res)
6
7# ✅ Omit both (e.g. background processing step in the conversation)
8update_current_trace(thread_id="your-thread-id")

If I/O is not provided, it defaults to the trace’s default I/O values. There must be at least one trace in the thread with an input or output set.

Set Tools Called

If your LLM app uses tool/function calling, you can log which tools were invoked for a given turn. This is attached to the trace alongside the output it helped generate.

main.py
1from deepeval.tracing import observe, update_current_trace
2from deepeval.test_case import ToolCall
3
4@observe()
5def llm_app(query: str):
6 res, tools = call_agent(query)
7 update_current_trace(
8 thread_id="your-thread-id",
9 input=query,
10 output=res,
11 tools_called=[ToolCall(name="WebSearch"), ToolCall(name="Calculator")]
12 )
13 return res

Set Retrieval Context

For RAG-based conversational apps, you can log the retrieval context used to generate a response. This enables Confident AI to evaluate retrieval quality across conversation turns.

main.py
1from deepeval.tracing import observe, update_current_trace
2
3@observe()
4def llm_app(query: str):
5 chunks = retrieve(query)
6 res = generate(query, chunks)
7 update_current_trace(
8 thread_id="your-thread-id",
9 input=query,
10 output=res,
11 retrieval_context=[chunk.text for chunk in chunks]
12 )
13 return res

You can combine tools_called and retrieval_context on the same trace — they provide complementary context about how the output was generated for that turn.

Next Steps

With threads set up, evaluate conversation quality or add more context to your traces.