Signals
Overview
Signals are the runtime side of classifiers. Once a classifier is enabled, every matching trace or thread gets a label (or none), and those labels surface here as cards, breakdowns, occurrence rows, and trend findings — your at-a-glance view of “what’s happening in production right now?”
The Signals page has two main areas:
- What’s happening — automatically detected spikes and trends across your classifier labels (e.g. “
RateLimitup 3.2x compared to the previous period”). - Classifications — one card per classifier showing the label distribution and counts in the selected time range, with drill-down into individual labels and occurrences.
Use the date picker and environment dropdown at the top to scope everything on the page.
No classifiers configured yet? Head to Project Settings → Classifiers to create one. Trace classifiers light up immediately as new traces arrive; thread classifiers populate after the configured idle window.
Drill Into a Label
Click any classifier card to drop into the per-classifier detail view: the label-level breakdown, a sparkline of occurrences over time, and the individual occurrence rows that contributed to each label.
Each occurrence row links straight to the underlying trace or thread so you can investigate why the LLM picked that label.
Filter the Observatory by Label
Anywhere classifiers run, you can filter the Observatory by classifier label to scope a search to just the data that matched. Pick the classifier and label from the filter dropdown; combine with other filters (environment, tag, metadata, latency, score, etc.) to narrow further.
A Worked Example: Failure Mode + User Sentiment
Suppose you want to know “how often does an upstream model failure correlate with a user being unhappy in the same conversation?” — a question that touches both a trace-level dimension (the failure) and a thread-level dimension (the sentiment). The cleanest way to answer it is two cooperating classifiers: one trace-scoped, one thread-scoped.
Step 1 — A trace classifier for failure mode
Create a trace classifier called Failure Mode with this description:
Inspect each trace for technical errors. Use the error field, status codes in metadata, and any error text in the output. Pick the label that best fits.
Add four labels:
A trace that didn’t fail simply gets no label — that’s the “no match” outcome and is the expected default for happy-path traffic.
Step 2 — A thread classifier for sentiment
Create a thread classifier called User Sentiment with this description:
Read the user-side turns of the conversation. Decide whether the user expresses positive, negative, or neutral sentiment about the assistant’s responses across the thread.
Add three labels: Positive, Negative, Neutral (each with one or two sentences describing what triggers it). Set a Time Limit of around 600 seconds so the classifier waits for the conversation to settle before grading sentiment.
Step 3 — Use the labels
Once both classifiers are enabled and have processed some data, you can pivot on the same dimensions across the platform:
- Signals page — both classifiers appear as cards. The “What’s happening” section flags spikes like “
RateLimitup 3.2x” or “Negativesentiment doubled vs. the previous period.” - Observatory traces — filter by
Failure Mode = Timeoutto inspect every trace that timed out in a window. - Observatory threads — filter by
User Sentiment = Negativeto inspect unhappy conversations end-to-end. - Combined investigation — start in threads filtered to
User Sentiment = Negative, then drop into one of those threads and look at how many of its traces are labeledFailure Mode = Timeout. That’s the canonical “did our outages cause user frustration?” workflow. - Dashboards — graph trace count broken down by
Failure Modelabel, or thread count over time broken down byUser Sentiment. See Dashboards for setting up breakdown widgets.
What Classifiers Can Do
A few capabilities worth knowing:
- Multiple labels per classifier. A classifier can have any number of labels — each trace or thread is assigned at most one of them.
- “No match” is fine. The classifier doesn’t force a label onto every item. Items that don’t fit any label are simply not signaled.
- Sample rate. Each classifier respects the trace or thread sample rate set in Project Settings → Classifiers, so you can tune classification overhead.
- Time limit (threads). Thread classifiers wait for a configurable idle window before evaluating, so you grade settled conversations instead of mid-flight ones.
The LLM sees what you ingest. Error messages, status codes, metadata, tags — anything the trace or thread carries is visible to the classifier. Detection by error code, metadata field, or sentiment all just work, as long as the description tells the LLM how to interpret what it sees.
Cost
Each classification logs a usage event for billing. See Project Settings → Data Usage under the Signals line for live usage and projected cost. Limits are gated by your org plan — Enterprise has no cap, but the line item is still visible.