LLM Tracing Quickstart

Instrument your LLM application for observability in less than 5 minutes

Overview

This guide shows you how to instrument your LLM app using the @observe decorator for Python or the observe wrapper for TypeScript.

Prefer one-line integrations or OpenTelemetry? You can also instrument your app via integrations for OpenAI, LangChain, and more or OpenTelemetry (OTEL) for any language — no decorator changes needed.

How it works

Tracing works through instrumentation, which can either be manual or through one of Confident AI’s integrations:

  1. Decorate or wrap your functions with @observe (Python) or observe (TypeScript)
  2. Each observed function becomes a span
  3. The outermost observed function becomes the trace — all nested spans roll up into it (see troubleshooting if spans are creating separate traces instead of nesting)
  4. Traces are sent to Confident AI asynchronously with zero latency impact
  5. Once ingested, traces can be evaluated automatically using your configured metrics

You should also understand the terminology for tracing:

Trace

A single end-to-end execution of your LLM app — the top-level unit of observability.

Span

An individual component within a trace, such as an LLM call, retrieval, or tool execution.

Thread

A group of traces representing a multi-turn conversation, linked by a shared thread ID.

Instrument Your AI App

You’ll need to get your API key as shown in the setup and installation section before continuing.

1

Install DeepEval

Instrumentation must be done via code, so first install DeepEval, Confident AI’s official open-source SDK:

$pip install -U deepeval
2

Set Your API Key

Get your Confident AI Project API key and login:

$export CONFIDENT_API_KEY=YOUR-API-KEY
3

Instrument Your App

Decorate or wrap your functions to automatically capture inputs, outputs, and execution flow. Note that each observe decorator/wrapper creates a span on the UI.

main.py
1from openai import OpenAI
2from deepeval.tracing import observe
3
4client = OpenAI()
5
6@observe()
7def llm_app(query: str) -> str:
8 return client.chat.completions.create(
9 model="gpt-4o",
10 messages=[
11 {"role": "user", "content": query}
12 ]
13 ).choices[0].message.content
14
15# Call app to send trace to Confident AI
16llm_app("Write me a poem.")
Tracing Quickstart

Done ✅. You just created a trace with a span inside it. Go to the Observatory to see your traces there.

If you don’t see the trace, it is 99.99% because your program exited before the traces had a chance to get posted. Try setting CONFIDENT_TRACE_FLUSH=1 if this is the case:

$CONFIDENT_TRACE_FLUSH=1

See the troubleshooting page for more details.

In a later section, you’ll learn how to create spans that are LLM specific, which allow you to log things like token cost and model name automatically.

Update traces & spans

Once inside an observed function, you can enrich the current trace or span with additional data using update_current_trace / update_current_span (Python) or updateCurrentTrace / updateCurrentSpan (TypeScript).

1from deepeval.tracing import observe, update_current_trace, update_current_span
2
3@observe(type="retriever")
4def retriever(query: str):
5 chunks = retrieve(query)
6 update_current_span(input=query, output=chunks)
7 return chunks
8
9@observe()
10def llm_app(query: str):
11 context = retriever(query)
12 res = generate(query, context)
13 update_current_trace(
14 input=query,
15 output=res,
16 tags=["production"],
17 metadata={"app_version": "1.2.3"}
18 )
19 return res

Both can be called multiple times from anywhere inside an observed function — values are merged, with later calls overriding earlier ones. Make sure to use the right one — see update_current_trace vs update_current_span in the troubleshooting page.

Using context manager

For pyhton users, if you prefer not to use the @observe decorator, DeepEval also supports the Observer context manager with the same arguments:

1from deepeval.tracing import Observer, update_current_span
2
3def generate(prompt: str) -> str:
4 with Observer(type="llm", model="gpt-4"):
5 res = call_llm(prompt)
6 update_current_span(input=prompt, output=res)
7 return res

As you learn more about the @observe decorator later on - you can rest assured that everything will automatically apply to context managers as well. This is useful when you can’t modify a function’s definition or need to instrument a specific code block rather than an entire function.

Instrument Multi-Turn Apps

If your app handles conversations or multi-turn interactions, you can group traces into a thread by providing a thread ID. Each call to your app creates a trace, and traces with the same thread ID are grouped together as a conversation.

main.py
1from openai import OpenAI
2from deepeval.tracing import observe, update_current_trace
3
4client = OpenAI()
5
6@observe()
7def llm_app(query: str):
8 res = client.chat.completions.create(
9 model="gpt-4o",
10 messages=[{"role": "user", "content": query}]
11 ).choices[0].message.content
12
13 update_current_trace(thread_id="your-thread-id", input=query, output=res)
14 return res
15
16llm_app("What's the weather in SF?")
17llm_app("What about tomorrow?")

The thread ID can be any string (e.g., a session ID from your app). The input and output is recommended to be the raw user text and LLM response respectively — Confident AI uses these as the conversation turns for display and thread evaluations.

For more details on thread I/O conventions, tools called, retrieval context, and running offline evals on threads, see the full Threads page.

Next steps

Now that you’ve learnt the very basics of instrumenting your AI app, dive deeper into: