Thread Traces | Confident AI Docs

Overview

A “thread” on Confident AI is a group of one or more traces linked by a shared thread ID. This is useful for building conversational AI apps — chatbots, multi-turn agents, etc. — where you want to view and evaluate an entire conversation as a single unit.

Each call to your app creates a trace, and traces with the same thread ID are grouped together chronologically as turns in a conversation.

Threads group traces together, not spans. Each trace represents one turn in the conversation.

Create a Thread

To create a thread, set a thread_id on your traces using update_current_trace / updateCurrentTrace. Any traces that share the same thread ID will be grouped into a single thread.

Python

TypeScript

main.py

1 from deepeval.tracing import observe, update_current_trace
2 from openai import OpenAI
3 
4 client = OpenAI()
5 
6 @observe()
7 def llm_app(query: str):
8     res = client.chat.completions.create(
9         model="gpt-4o",
10         messages=[{"role": "user", "content": query}]
11     ).choices[0].message.content
12 
13     update_current_trace(thread_id="your-thread-id", input=query, output=res)
14     return res
15 
16 llm_app("What's the weather in SF?")
17 llm_app("What about tomorrow?")

The thread_id / threadId can be any string — typically a session ID or conversation ID from your app.

Set Thread I/O

Although not strictly enforced, you should set the input to the raw user text and the output to the generated LLM text for each trace. These are used as the conversation turns for display on Confident AI and for thread evaluations.

Python

TypeScript

main.py

1 from deepeval.tracing import observe, update_current_trace
2 from openai import OpenAI
3 
4 client = OpenAI()
5 
6 @observe()
7 def llm_app(query: str):
8     messages = {"role": "user", "content": query}
9     res = client.chat.completions.create(
10         model="gpt-4o",
11         messages=messages
12     ).choices[0].message.content
13 
14     # ✅ Do this — query is the raw user input
15     update_current_trace(thread_id="your-thread-id", input=query, output=res)
16 
17     # ❌ Don't do this — messages is not the raw user input
18     # update_current_trace(thread_id="your-thread-id", input=messages, output=res)
19     return res

You don’t have to set both input and output on every trace. If a turn only has a user input or only an LLM output, you can set just one. Confident AI will format the turns accordingly on the UI and for evals.

Python

TypeScript

example.py

1 # ✅ Set only input (e.g. user message with no immediate LLM response)
2 update_current_trace(thread_id="your-thread-id", input=query)
3 
4 # ✅ Set only output (e.g. proactive LLM message with no user input)
5 update_current_trace(thread_id="your-thread-id", output=res)
6 
7 # ✅ Omit both (e.g. background processing step in the conversation)
8 update_current_trace(thread_id="your-thread-id")

If I/O is not provided, it defaults to the trace’s default I/O values. There must be at least one trace in the thread with an input or output set.

Set Tools Called

If your LLM app uses tool/function calling, you can log which tools were invoked for a given turn. This is attached to the trace alongside the output it helped generate.

Python

TypeScript

main.py

1 from deepeval.tracing import observe, update_current_trace
2 from deepeval.test_case import ToolCall
3 
4 @observe()
5 def llm_app(query: str):
6     res, tools = call_agent(query)
7     update_current_trace(
8         thread_id="your-thread-id",
9         input=query,
10         output=res,
11         tools_called=[ToolCall(name="WebSearch"), ToolCall(name="Calculator")]
12     )
13     return res

Set Retrieval Context

For RAG-based conversational apps, you can log the retrieval context used to generate a response. This enables Confident AI to evaluate retrieval quality across conversation turns.

Python

TypeScript

main.py

1 from deepeval.tracing import observe, update_current_trace
2 
3 @observe()
4 def llm_app(query: str):
5     chunks = retrieve(query)
6     res = generate(query, chunks)
7     update_current_trace(
8         thread_id="your-thread-id",
9         input=query,
10         output=res,
11         retrieval_context=[chunk.text for chunk in chunks]
12     )
13     return res

You can combine tools_called and retrieval_context on the same trace — they provide complementary context about how the output was generated for that turn.

Next Steps

With threads set up, evaluate conversation quality or add more context to your traces.

Evaluate Threads

Run online evaluations on entire conversation threads to monitor multi-turn quality.

Customize Traces

Add tags, metadata, and user info to your traces for filtering and analysis.