Red Team Using DeepTeam | Confident AI Docs

Overview

Confident AI’s red teaming capabilities offer a variety of features to test AI safety and security in development for a pre-deployment workflow, offering a wide range of features for:

Vulnerability assessment: Systematically identify weaknesses like bias, toxicity, PII leakage, and prompt injection vulnerabilities.
Adversarial testing: Simulate real-world attacks using jailbreaking, prompt injection, and other sophisticated attack methods.
Risk profiling: Comprehensive evaluation across 40+ vulnerability types with detailed risk assessments and remediation guidance.

All vulnerabilities and attacks on DeepTeam are also available on Confident AI.

Local Red Teaming

Run red teaming locally using deepteam with full control over vulnerabilities and attacks
Support for custom vulnerabilities, attack methods, and advanced red teaming algorithms

Suitable for: Python users, development, and pre-deployment security workflows

Remote Red Teaming

Run red teaming on Confident AI platform with pre-built vulnerability frameworks
Integrated with monitoring, risk assessments, and team collaboration features

Suitable for: Non-python users, continuous monitoring, and production safety assessments

Create a Risk Assessment

This examples goes through a comprehensive safety assessment using adversarial attacks to identify vulnerabilities in your AI system.

You’ll need to get your API key as shown in the setup and installation section before continuing.

Running red teaming locally executes attacks on your machine and uploads results to Confident AI. This gives full control over custom vulnerabilities and attack methods.

Install DeepTeam

Install DeepTeam, Confident AI’s open-source red teaming framework:

$ pip install -U deepteam

Set Your API Key

Set your Confident AI API key so results are uploaded to the platform:

$ deepteam login

Or set it as an environment variable:

$ export CONFIDENT_API_KEY=YOUR-API-KEY

Set Up Your Target Model

Define your AI system as a model callback function. This is the AI application you want to red team:

1 from deepteam.test_case import RTTurn, ToolCall
2 
3 async def model_callback(input: str) -> str:
4     # Replace this with your actual LLM application
5     # This could be a RAG pipeline, chatbot, agent, etc.
6     return RTTurn(
7         role="assistant",
8         content="Your agent's response here...",
9         retrieval_context=["Your retieval context here"],
10         tools_called=[
11             ToolCall(name="SearchDatabase")
12         ]
13     )

The model callback must accept a single string parameter (the adversarial input), and return an RTTurn object with role as assistant and content being your AI system’s response. You can also pass retrieval_context and tools_called in your RTTurn object when testing RAG or agentic systems. retrieval_context can be a list of strings and tools_called must be a list of ToolCall objects.

Configure vulnerabilities and attacks

Choose which vulnerabilities to test for and which attack methods to use:

1 from deepteam import red_team
2 from deepteam.vulnerabilities import Bias, Toxicity, PIILeakage
3 from deepteam.attacks.single_turn import PromptInjection
4 from deepteam.attacks.multi_turn import LinearJailbreaking
5 
6 # Define vulnerabilities to test
7 vulnerabilities = [
8     Bias(types=["race", "gender", "political"]),
9     Toxicity(types=["profanity", "insults", "threats"]),
10     PIILeakage(types=["direct_disclosure", "api_and_database_access"])
11 ]
12 
13 # Define attack methods
14 attacks = [
15     PromptInjection(weight=2),  # Higher weight = more likely to be selected
16     LinearJailbreaking(weight=1)
17 ]

Run the red team assessment

Execute the red teaming assessment with your configured parameters:

1 # Run comprehensive red teaming
2 risk_assessment = red_team(
3     model_callback=model_callback,
4     vulnerabilities=vulnerabilities,
5     attacks=attacks,
6     attacks_per_vulnerability_type=3,
7     max_concurrent=5
8 )

This will run red teaming on your model_callback using the configured vulnerabilities and attacks and generate a risk assessment which is printed onto your console and also uploaded to the Confident AI platform. You can now view these risk assessments in the Risk Profile section on the Confident AI paltform.

You need to run deepteam login command from the CLI or save your API key as CONFIDENT_API_KEY in your env for your risk assessments to be uploaded to the Confident AI platform.

Using Security Frameworks

Instead of configuring vulnerabilities and attacks manually, you can use pre-defined security frameworks like OWASP, NIST AI RMF, and MITRE ATLAS via deepteam:

1 from deepteam.frameworks import OWASPTop10
2 from deepteam import red_team
3 
4 # Run with framework
5 red_team(
6     model_callback=model_callback,
7     framework=OWASPTop10(),
8 )

Results will be posted to the Confident AI platform automatically.

These are the same frameworks available in the no-code workflow (OWASP, NIST, MITRE ATLAS), but used programmatically. Note that code-driven assessments do not support the cloud framework builder or CVSS scoring.

Best Practices

Start with frameworks: Use OWASP Top 10 or NIST AI RMF for comprehensive coverage
Test early and often: Integrate red teaming into your development cycle
Focus on your use case: Customize vulnerabilities based on your application’s risks
Monitor continuously: Set up ongoing safety assessments for production systems
Document and remediate: Keep detailed records of findings and remediation efforts

Next Steps

Framework-Based Testing

Use industry-standard frameworks like OWASP Top 10 and NIST AI RMF for comprehensive security assessments

Risk Profile & Assessments

Create custom vulnerabilities and attack methods tailored to your specific use case and industry requirements

Red teaming works seamlessly with your existing LLM evaluation and tracing workflows on Confident AI.