For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Trust CenterStatusSupportGet a demoPlatform
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingChangelog
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingChangelog
  • Get Started
    • Introduction
    • Setup and Installation
  • LLM Evaluation
    • Introduction
    • Experiments
  • Metrics
    • Introduction
    • Metric Collections
    • Custom Metrics
  • LLM Tracing
    • Introduction
      • Quickstart
      • Configure Span Types
      • Log Prompts
      • Track LLM Costs
      • Set Input/Output
      • Thread Traces
    • Signals
    • Troubleshooting
  • Human-in-the-Loop
    • Introduction
    • Collect Feedback
  • Reporting & Analytics
    • Dashboards
    • Executive Insights
  • Red Teaming
    • Introduction
    • Quickstart
    • Frameworks & Policies
    • Risk Profiles
    • Red Team Using DeepTeam
  • Resources
    • Why Confident AI
    • Support
    • Data Handling
    • LLM Use Cases
LogoLogo
Trust CenterStatusSupportGet a demoPlatform
On this page
  • Overview
  • Create a Thread
  • Set Thread I/O
  • Set Thread Fields
  • Set Tools Called
  • Set Retrieval Context
  • Next Steps
LLM TracingInstrument Your App

Thread Traces

Group your traces as threads to evaluate an entire conversation workflow
Was this page helpful?
Previous

Evaluate Traces & Spans

Run online and offline evaluations on individual traces and spans on the fly
Next
Built with

Overview

A “thread” on Confident AI is a group of one or more traces linked by a shared thread ID. This is useful for building conversational AI apps — chatbots, multi-turn agents, etc. — where you want to view and evaluate an entire conversation as a single unit.

Each call to your app creates a trace, and traces with the same thread ID are grouped together chronologically as turns in a conversation.

Threads group traces together, not spans. Each trace represents one turn in the conversation.

Create a Thread

To create a thread, set a thread_id on your traces using update_current_trace / updateCurrentTrace. Any traces that share the same thread ID will be grouped into a single thread.

Python
TypeScript
main.py
1from deepeval.tracing import observe, update_current_trace
2from openai import OpenAI
3
4client = OpenAI()
5
6@observe()
7def llm_app(query: str):
8 res = client.chat.completions.create(
9 model="gpt-4o",
10 messages=[{"role": "user", "content": query}]
11 ).choices[0].message.content
12
13 update_current_trace(thread_id="your-thread-id", input=query, output=res)
14 return res
15
16llm_app("What's the weather in SF?")
17llm_app("What about tomorrow?")

The thread_id / threadId can be any string — typically a session ID or conversation ID from your app.

Set Thread I/O

Although not strictly enforced, you should set the input to the raw user text and the output to the generated LLM text for each trace. These are used as the conversation turns for display on Confident AI and for thread evaluations.

Python
TypeScript
main.py
1from deepeval.tracing import observe, update_current_trace
2from openai import OpenAI
3
4client = OpenAI()
5
6@observe()
7def llm_app(query: str):
8 messages = {"role": "user", "content": query}
9 res = client.chat.completions.create(
10 model="gpt-4o",
11 messages=messages
12 ).choices[0].message.content
13
14 # ✅ Do this — query is the raw user input
15 update_current_trace(thread_id="your-thread-id", input=query, output=res)
16
17 # ❌ Don't do this — messages is not the raw user input
18 # update_current_trace(thread_id="your-thread-id", input=messages, output=res)
19 return res

You don’t have to set both input and output on every trace. If a turn only has a user input or only an LLM output, you can set just one. Confident AI will format the turns accordingly on the UI and for evals.

Python
TypeScript
example.py
1# ✅ Set only input (e.g. user message with no immediate LLM response)
2update_current_trace(thread_id="your-thread-id", input=query)
3
4# ✅ Set only output (e.g. proactive LLM message with no user input)
5update_current_trace(thread_id="your-thread-id", output=res)
6
7# ✅ Omit both (e.g. background processing step in the conversation)
8update_current_trace(thread_id="your-thread-id")

If I/O is not provided, it defaults to the trace’s default I/O values. There must be at least one trace in the thread with an input or output set.

Set Thread Fields

You can attach custom metadata and tags to a thread to label production conversations with attributes like DVA version, client, agent ID, or status flags. Both are filterable and groupable across the observatory, which makes it easy to slice production traffic.

Thread fields are set by including a thread object on any trace you ingest into the thread. thread.id is an alternate, idiomatic way to specify the thread — it’s equivalent to top-level threadId and either one is sufficient. Metadata values can be any JSON-serializable type (stringified server-side), and tags are an array of strings.

POST /v1/traces
1{
2 "uuid": "<TRACE-UUID>",
3 "input": "What's the weather in SF?",
4 "output": "It's 65°F and sunny.",
5 "startTime": "2025-01-15T10:30:00Z",
6 "endTime": "2025-01-15T10:30:05Z",
7 "thread": {
8 "id": "your-thread-id",
9 "metadata": {
10 "dvaVersion": "1.4.2",
11 "client": "acme-corp",
12 "agentId": "support-agent"
13 },
14 "tags": ["vip", "billing"]
15 }
16}

thread.metadata and thread.tags only take effect when a thread id is resolvable — either via thread.id or top-level threadId. Requests without one are rejected with a 400. If both thread.id and threadId are sent they must match.

Successive ingestions for the same thread merge metadata keys, so you can build up a thread’s metadata incrementally across turns. Sending the same key again overwrites the previous value. Tags replace any previously stored value, so always send the full set you want on the thread:

Subsequent trace — merges into existing thread metadata
1{
2 "uuid": "<NEXT-TRACE-UUID>",
3 "threadId": "your-thread-id",
4 "startTime": "2025-01-15T10:31:00Z",
5 "endTime": "2025-01-15T10:31:05Z",
6 "thread": {
7 "metadata": {
8 "agentId": "support-agent-v2",
9 "escalated": true
10 }
11 }
12}

After the two requests above, the thread’s metadata is:

1{
2 "dvaVersion": "1.4.2",
3 "client": "acme-corp",
4 "agentId": "support-agent-v2",
5 "escalated": true
6}

You can view a thread’s metadata and tags directly from the platform on the thread details page.

Set Tools Called

If your LLM app uses tool/function calling, you can log which tools were invoked for a given turn. This is attached to the trace alongside the output it helped generate.

Python
TypeScript
main.py
1from deepeval.tracing import observe, update_current_trace
2from deepeval.test_case import ToolCall
3
4@observe()
5def llm_app(query: str):
6 res, tools = call_agent(query)
7 update_current_trace(
8 thread_id="your-thread-id",
9 input=query,
10 output=res,
11 tools_called=[ToolCall(name="WebSearch"), ToolCall(name="Calculator")]
12 )
13 return res

Set Retrieval Context

For RAG-based conversational apps, you can log the retrieval context used to generate a response. This enables Confident AI to evaluate retrieval quality across conversation turns.

Python
TypeScript
main.py
1from deepeval.tracing import observe, update_current_trace
2
3@observe()
4def llm_app(query: str):
5 chunks = retrieve(query)
6 res = generate(query, chunks)
7 update_current_trace(
8 thread_id="your-thread-id",
9 input=query,
10 output=res,
11 retrieval_context=[chunk.text for chunk in chunks]
12 )
13 return res

You can combine tools_called and retrieval_context on the same trace — they provide complementary context about how the output was generated for that turn.

Next Steps

With threads set up, evaluate conversation quality or add more context to your traces.

Evaluate Threads

Run online evaluations on entire conversation threads to monitor multi-turn quality.

Customize Traces

Add tags, metadata, and user info to your traces for filtering and analysis.