Introduction to AI Governance

Codify your organization's AI compliance requirements into policies, and continuously enforce them across every project.

Overview

AI Governance on Confident AI lets you turn your organization’s compliance requirements into policies that are continuously enforced across your projects. A policy is a group of controls — individual, measurable requirements that are automatically assessed against the real state of each project (its datasets, traces, alerts, test runs, risk assessments, and more).

Define your standard for evaluation, observability, and red teaming once, and apply it everywhere. Every project assigned to a policy is held to the same quality bar, so every team ships with confidence that their AI meets the standard your organization expects. Governance makes that bar explicit, consistent, and automatically enforced — giving everyone a shared definition of what “good” looks like.

This gives compliance and engineering teams a single source of truth for answering “Is this AI application allowed to ship?” — and lets you block deployments that don’t meet your standards.

AI Governance is an enterprise feature. Contact us if you’d like it enabled for your organization.

How it works

1

Define controls

A control is a single requirement, such as “traces are being logged”, “p95 latency stays under 2s”, or “the latest official red teaming assessment passed”. Controls are assessed automatically and resolve to a status.

2

Group controls into a policy

A policy is a named group of controls — typically mapped to a compliance framework such as the EU AI Act or NIST AI RMF. A policy is met only when all of its controls pass.

3

Assign projects to a policy

Each project belongs to at most one policy. Every project assigned to a policy is assessed against all of that policy’s controls.

4

Assess and gate

Assessments run automatically on a daily schedule and on demand. You can also run them as a deploy gate in CI/CD — blocking a release unless every control passes.

Core concepts

Control types

Controls come in four types, each covering a different slice of your AI lifecycle:

TypeWhat it checks
OperationalStatic configuration checks — e.g. datasets exist, traces are logged, alerts are configured.
RuntimeThreshold-based metrics over your observability data (traces, spans, threads), much like alerts.
Pre-deployment (evals)Gates on a recent test run — for example, requiring the latest official run to pass.
Pre-deployment (red teaming)Gates on a recent risk assessment from your red teaming workflows.

See Controls for the full breakdown of each type and how they’re configured.

Assessment statuses

Every control assessment resolves to one of four statuses:

StatusMeaning
PASSThe control’s requirement is satisfied.
FAILThe requirement is not satisfied — e.g. a check failed, a threshold was breached, or the gated run didn’t match.
ERRORThe assessment couldn’t run, usually due to a misconfigured control.
NO_DATAThere was no data to assess — e.g. no metrics in the window, or no qualifying run yet.

A policy is only considered met when every control resolves to PASS. Any FAIL, ERROR, or NO_DATA means the policy is not met.

Gating deployments

Enforce a policy in your CI/CD pipeline using the deepeval CLI (available in both Python and TypeScript), or call the public API directly. The gate assesses every control in the project’s policy and only passes if all of them pass:

$deepeval gate

The CLI exits with code 0 only when the policy is fully met, and a non-zero code otherwise. All three call the POST /v1/governance/assess endpoint under the hood using your project’s API key. A non-zero exit code stops your pipeline, preventing a non-compliant deployment from shipping.

Learn more

  • Policies — group controls and assign projects
  • Controls — the four control types and how to configure them
  • Alerts — the observability primitive behind runtime controls
  • Risk Profiles — what red teaming pre-deployment controls assess against