Governance Controls

The individual, assessable requirements that make up a policy.

A control is a single, measurable requirement that is automatically assessed against the real state of a project. Controls are grouped into policies, and a policy is met only when all of its controls pass.

Every assessment resolves to one of four statuses:

StatusMeaning
PASSThe requirement is satisfied.
FAILThe requirement is not satisfied.
ERRORThe assessment couldn’t run, usually due to a misconfigured control.
NO_DATAThere was no data to assess in the evaluated window.

Control types

There are four control types, each assessing a different part of your AI lifecycle.

Operational

Static configuration checks — is the project set up the way your standards require?

Runtime

Threshold-based checks over your observability metrics.

Pre-deployment (evals)

Gates on a recent evaluation test run.

Pre-deployment (red teaming)

Gates on a recent red teaming risk assessment.

Operational controls

Operational controls verify that a project is configured according to your standards. They are static checks — assessed purely from the current state of the project, with no thresholds to configure. These controls ship with the platform and can’t be created manually; you simply add the ones you need to a policy.

Operational controls cover areas such as observability, datasets, integrations, and threat detection:

CategoryControlPasses when the project has…
AlertsHas scheduled alertsAt least one scheduled alert
Has scheduled alerts on tracesA scheduled alert on traces
Has scheduled alerts on spansA scheduled alert on spans
Has scheduled alerts on threadsA scheduled alert on threads
Scheduled jobsHas scheduled eval test runsA recurring evaluation test run
Has scheduled risk assessmentsA recurring red teaming risk assessment
DatasetsHas dataset ingestionAt least one dataset
Has single-turn datasetsA single-turn dataset
Has multi-turn datasetsA multi-turn dataset
Has dataset versionsA versioned dataset
TracingHas logged tracesTraces logged in the last 30 days
Has logged threadsThreads logged in the last 30 days
Has queue ingestionAn annotation queue receiving items
IntegrationsHas alert integrationsA notification integration (Slack, Discord, Email, PagerDuty, or Teams)
Has ticketing integrationsA ticketing integration (Linear or GitHub Issues)
ClassifiersHas trace classifiers enabledTrace classifiers enabled
Has thread classifiers enabledThread classifiers enabled
MetricsHas custom metricsAt least one custom metric
Has metric collectionsAt least one metric collection
Threat detectionHas trace threat detection enabledTrace threat detection enabled
Has thread threat detection enabledThread threat detection enabled

Time-bound checks (logged traces and threads) evaluate the last 30 days of activity.

Runtime controls

Runtime controls assess a metric over your observability data and check it against a threshold — the same model used by alerts. They are evaluated over a trailing 24-hour window.

Configure a runtime control with:

  • Data modelTrace, Span, or Thread
  • Aggregation — the metric to compute; options depend on the data model (e.g. count, error rate, average/percentile latency, token cost, unique end users)
  • Threshold — a direction (Above or Below) and a numeric value
  • Filters — optionally narrow the data the control evaluates (environment, tags, metadata, and more)

The control fails when the aggregated metric crosses the threshold (above the value for Above, below it for Below), and resolves to NO_DATA when there’s no matching data in the window.

Runtime controls are ideal for codifying production SLAs — like keeping error rates or latency within bounds — directly into a governance policy.

Pre-deployment controls (evals)

These controls gate on a recent evaluation test run. A control passes when a qualifying test run exists and matches the configured filters.

Choose how the gating run is selected:

  • Latest official run — gate on the most recent run marked official in the project, or
  • Identifier + window — gate on the latest completed run matching a given identifier within a rolling window (7, 14, 30, or 90 days)

You can also apply filters (including hyperparameters) — the selected run must match them to pass.

StatusWhen
PASSA qualifying run exists and matches the filters.
FAILA run exists but doesn’t match the filters.
NO_DATANo qualifying run was found.

Mark a test run as official to designate it as the source of truth for gating, instead of relying on an identifier and window.

Pre-deployment controls (red teaming)

These work exactly like the evals pre-deployment controls, but gate on a red teaming risk assessment instead of a test run. Select the gating assessment by its latest official result, or by identifier + window, and optionally apply filters.

This lets you require, for example, that the latest official risk assessment passed before a release is allowed to ship.

Versioning

Controls are versioned. Each time you change a control’s configuration, a new version is appended to its history. Assessments always run against the latest version, while older versions remain for audit purposes.

Next steps