Introduction to Red Teaming

Proactively identify vulnerabilities and safety issues in your AI applications before they reach production.

Overview

Red Teaming on Confident AI is an adversarial testing platform for AI safety and security, and can be done in two ways:

  • No-code directly in the platform UI, best for security teams, PMs, and compliance officers, or,
  • Code-driven using the deepteam framework, best for engineers and AI red teamers.

The no-code workflow is the more comprehensive option — it leverages the platform’s framework builder to produce CVSS scores and full risk profiles. The code-driven workflow uses deepteam to orchestrate red teaming runs programmatically.

Red Teaming integrates seamlessly with your existing LLM evaluation and tracing workflows on Confident AI.

Key capabilities

Everything you need to operationalize AI red teaming:

Custom Frameworks

Ships with OWASP Top 10 for LLMs, NIST AI RMF, and more out of the box — or build your own with the framework builder.

CVSS Scoring

Get standardized vulnerability severity scoring for every risk assessment you run.

Compliance Reporting

Generate reports aligned to regulatory frameworks like EU AI Act, NIST, and OWASP Top 10 for LLMs.

CI/CD Integration

Integrate security testing into your deployment pipelines for continuous assessment.

Choose your workflow

Run risk assessments in the UI without any code or use deepteam programmatically:

Not sure which to pick?

Start with no-code — it gives you the most comprehensive assessment out of the box, including CVSS scores and the framework builder. Use code-driven when you need to orchestrate red teaming in CI/CD or develop custom attacks. Results from both workflows appear in the same dashboards.

What you can red team

Test your AI systems across all major vulnerability categories:

Prompt Injection & Jailbreaks

Probe for prompt manipulation, model exploitation, and adversarial bypass techniques across 50+ attack patterns.

Bias & Fairness

Assess outputs for bias across protected characteristics and demographic groups.

Content Safety

Evaluate for harmful, toxic, or inappropriate outputs including PII leakage and misinformation.

Learn the fundamentals

New to AI red teaming? These concepts will help you get the most out of your setup:

AI Red Teaming targets vulnerabilities specific to AI systems — prompt injection, model poisoning, adversarial examples — rather than infrastructure or application-level security. See our frameworks guide for more detail.

All types — conversational AI, RAG systems, multi-agent workflows, fine-tuned models, and AI-powered APIs. Each system type has tailored attack scenarios and evaluation criteria.

No. The platform provides automated risk assessments, pre-built attack scenarios, and step-by-step remediation guidance. Security expertise helps with advanced features, but isn’t required to get started.