Evaluation Rules
Evaluation rules let you attach a metric collection to incoming traces, spans, or threads without changing your application code. When data is ingested into your project, each enabled rule checks whether the trace, span, or thread matches its conditions and runs the configured metric collection automatically.
Evaluation rules are a no-code alternative to passing metric_collection
through the SDK. If your API call already supplies a metric collection, the
API value wins—rules only attach evaluations when the API does not.
Create an Evaluation Rule
To create an evaluation rule:
- Navigate to Project Settings → Evaluation Rules
- Click New Rule
- Enter a unique Name (and an optional Description)
- Pick a Data Model—
Trace,Span, orThread - (For span rules) Optionally pick a Span Type to restrict the rule to one type of span (e.g.,
LLM,Tool,Retriever) - Pick a Metric Collection to run when the rule matches
- (Optional) Configure Filters to limit which entities the rule applies to
- (Optional) Set a Sample Rate between
0and1to evaluate only a fraction of matches - (For thread rules) Set a Time Limit in seconds—the inactivity window after which the thread is evaluated
- Toggle Enabled on, then click Create Rule
You can quickly enable or disable a rule from the list view without opening the editor.
Data Models
Each rule applies to one of three data models. The data model determines what the rule can match against and when evaluations run.
For span rules, you can additionally restrict the rule to a specific Span Type. Leave the span type blank to match every span regardless of type.
Only one enabled Thread rule can target a given metric collection at a
time—Confident AI prevents duplicate thread rules so each conversation isn’t
evaluated multiple times for the same metrics.
Filters
Filters narrow down which traces, spans, or threads a rule applies to. Use filters to scope a rule to specific environments, tags, metadata fields, or other dimensions of your data.
For example, you can configure a Trace rule that only fires when metadata.env equals cloud, or a Span rule that only matches LLM spans whose latency exceeds a threshold.
Leave Filters empty to match every entity for the chosen data model.
Sample Rate
Sample rate controls how often a matching rule actually fires. A rule with a sample rate of 0.25 evaluates roughly one in four matches. Sampling is deterministic, so the same trace, span, or thread will always make the same sample decision for a given rule.
This is useful when you want signal on metric trends without paying to evaluate every single ingested item.
Time Limit (Thread Rules)
Thread rules use a Time Limit (in seconds) to decide when a multi-turn conversation is “done” and ready to evaluate. After the rule’s filters match a trace in a thread, Confident AI waits the configured number of seconds—if no new traces arrive in that window, the thread is evaluated using the rule’s metric collection.
If new traces continue to arrive, the wait window resets. This lets you evaluate threads automatically without having to call the evaluate thread function from your code.
Interaction With API Metric Collections
When a trace or span is ingested with metric_collection set via the SDK, that value wins—the API path runs the metrics you supplied and rules do not add extra collections to that item. Rules only attach evaluations when the API call did not supply a metric collection.
For threads, evaluation rules are the way to evaluate threads automatically—there is no equivalent inline parameter on traces that triggers a thread evaluation. Threads can still be evaluated explicitly via the evaluate thread function.