LLM Tracing Quickstart | Confident AI Docs

Overview

This guide shows you how to instrument your LLM app using the @observe decorator for Python or the observe wrapper for TypeScript.

Prefer one-line integrations or OpenTelemetry? You can also instrument your app via integrations for OpenAI, LangChain, and more or OpenTelemetry (OTEL) for any language — no decorator changes needed.

How it works

Tracing works through instrumentation, which can either be manual or through one of Confident AI’s integrations:

Decorate or wrap your functions with @observe (Python) or observe (TypeScript)
Each observed function becomes a span
The outermost observed function becomes the trace — all nested spans roll up into it (see troubleshooting if spans are creating separate traces instead of nesting)
Traces are sent to Confident AI asynchronously with zero latency impact
Once ingested, traces can be evaluated automatically using your configured metrics

You should also understand the terminology for tracing:

Trace

A single end-to-end execution of your LLM app — the top-level unit of observability.

Span

An individual component within a trace, such as an LLM call, retrieval, or tool execution.

Thread

A group of traces representing a multi-turn conversation, linked by a shared thread ID.

Instrument Your AI App

You’ll need to get your API key as shown in the setup and installation section before continuing.

Install DeepEval

Instrumentation must be done via code, so first install DeepEval, Confident AI’s official open-source SDK:

Python

TypeScript

$ pip install -U deepeval

Set Your API Key

Get your Confident AI Project API key and login:

Python

TypeScript

$ export CONFIDENT_API_KEY=YOUR-API-KEY

Instrument Your App

Decorate or wrap your functions to automatically capture inputs, outputs, and execution flow. Note that each observe decorator/wrapper creates a span on the UI.

Python

TypeScript

main.py

1 from openai import OpenAI
2 from deepeval.tracing import observe
3 
4 client = OpenAI()
5 
6 @observe()
7 def llm_app(query: str) -> str:
8     return client.chat.completions.create(
9         model="gpt-4o",
10         messages=[
11             {"role": "user", "content": query}
12         ]
13     ).choices[0].message.content
14 
15 # Call app to send trace to Confident AI
16 llm_app("Write me a poem.")

Tracing Quickstart

Done ✅. You just created a trace with a span inside it. Go to the Observatory to see your traces there.

If you don’t see the trace, it is 99.99% because your program exited before the traces had a chance to get posted. Try setting CONFIDENT_TRACE_FLUSH=1 if this is the case:

$ CONFIDENT_TRACE_FLUSH=1

See the troubleshooting page for more details.

In a later section, you’ll learn how to create spans that are LLM specific, which allow you to log things like token cost and model name automatically.

Update traces & spans

Once inside an observed function, you can enrich the current trace or span with additional data using update_current_trace / update_current_span (Python) or updateCurrentTrace / updateCurrentSpan (TypeScript).

Python

TypeScript

1 from deepeval.tracing import observe, update_current_trace, update_current_span
2 
3 @observe(type="retriever")
4 def retriever(query: str):
5     chunks = retrieve(query)
6     update_current_span(input=query, output=chunks)
7     return chunks
8 
9 @observe()
10 def llm_app(query: str):
11     context = retriever(query)
12     res = generate(query, context)
13     update_current_trace(
14         input=query,
15         output=res,
16         tags=["production"],
17         metadata={"app_version": "1.2.3"}
18     )
19     return res

update_current_trace / updateCurrentTrace sets data on the trace (the outermost observed function) — use it for input/output, tags, metadata, threads, and users.
update_current_span / updateCurrentSpan sets data on the current span — use it for span-level input/output, metadata, and online eval test case parameters.

Both can be called multiple times from anywhere inside an observed function — values are merged, with later calls overriding earlier ones. Make sure to use the right one — see update_current_trace vs update_current_span in the troubleshooting page.

Using context manager

For pyhton users, if you prefer not to use the @observe decorator, DeepEval also supports the Observer context manager with the same arguments:

1 from deepeval.tracing import Observer, update_current_span
2 
3 def generate(prompt: str) -> str:
4     with Observer(type="llm", model="gpt-4"):
5         res = call_llm(prompt)
6         update_current_span(input=prompt, output=res)
7         return res

As you learn more about the @observe decorator later on - you can rest assured that everything will automatically apply to context managers as well. This is useful when you can’t modify a function’s definition or need to instrument a specific code block rather than an entire function.

Instrument Multi-Turn Apps

If your app handles conversations or multi-turn interactions, you can group traces into a thread by providing a thread ID. Each call to your app creates a trace, and traces with the same thread ID are grouped together as a conversation.

Python

TypeScript

main.py

1 from openai import OpenAI
2 from deepeval.tracing import observe, update_current_trace
3 
4 client = OpenAI()
5 
6 @observe()
7 def llm_app(query: str):
8     res = client.chat.completions.create(
9         model="gpt-4o",
10         messages=[{"role": "user", "content": query}]
11     ).choices[0].message.content
12 
13     update_current_trace(thread_id="your-thread-id", input=query, output=res)
14     return res
15 
16 llm_app("What's the weather in SF?")
17 llm_app("What about tomorrow?")

The thread ID can be any string (e.g., a session ID from your app). The input and output is recommended to be the raw user text and LLM response respectively — Confident AI uses these as the conversation turns for display and thread evaluations.

For more details on thread I/O conventions, tools called, retrieval context, and running offline evals on threads, see the full Threads page.

Next steps

Now that you’ve learnt the very basics of instrumenting your AI app, dive deeper into:

Configure Span Types

Classify spans as LLM, retriever, tool, or agent — and set type-specific attributes like model name, token costs, and embedder config.

Online Evals

Run evaluations on traces, spans, and threads in real-time as they’re ingested into Confident AI to monitor AI quality.