Quickstart

5 min quickstart guide for AI red teaming

Confident AI red teaming is in private beta.

Overview

Confident AI’s red teaming capabilities offer a variety of features to test AI safety and security in development for a pre-deployment workflow, offering a wide range of features for:

  • Vulnerability assessment: Systematically identify weaknesses like bias, toxicity, PII leakage, and prompt injection vulnerabilities.
  • Adversarial testing: Simulate real-world attacks using jailbreaking, prompt injection, and other sophisticated attack methods.
  • Risk profiling: Comprehensive evaluation across 40+ vulnerability types with detailed risk assessments and remediation guidance.

You can either run red teaming locally or remotely on Confident AI, both of which uses deepteam and gives you the same functionality:

Local Red Teaming
  • Run red teaming locally using deepteam with full control over vulnerabilities and attacks
  • Support for custom vulnerabilities, attack methods, and advanced red teaming algorithms

Suitable for: Python users, development, and pre-deployment security workflows

Remote Red Teaming
  • Run red teaming on Confident AI platform with pre-built vulnerability frameworks
  • Integrated with monitoring, risk assessments, and team collaboration features

Suitable for: Non-python users, continuous monitoring, and production safety assessments

Create a Risk Assessment

This examples goes through a comprehensive safety assessment using adversarial attacks to identify vulnerabilities in your AI system.

You’ll need to get your API key as shown in the setup and installation section before continuing.

Running red teaming locally executes attacks on your machine and uploads results to Confident AI. This gives full control over custom vulnerabilities and attack methods.

1

Install DeepTeam

First, install DeepTeam by reaching out to your representative at Confident AI to get access. The OSS version does not allow access to Confident AI red teaming as of now.

DeepTeam is powered by DeepEval’s evaluation framework, so you’ll also need to set up your API keys for the underlying LLM providers.

2

Set up your target model

Define your AI system as a model callback function. This is the system you want to red team:

1async def model_callback(input: str) -> str:
2 # Replace this with your actual LLM application
3 # This could be a RAG pipeline, chatbot, agent, etc.
4 return f"I'm a helpful AI assistant. Regarding your input: {input}"

The model callback must accept a single string parameter (the adversarial input), return a single string (your AI system’s response), and can be async for better performance.

3

Configure vulnerabilities and attacks

Choose which vulnerabilities to test for and which attack methods to use:

1from deepteam import red_team
2from deepteam.vulnerabilities import Bias, Toxicity, PIILeakage
3from deepteam.attacks.single_turn import PromptInjection
4from deepteam.attacks.multi_turn import LinearJailbreaking
5
6# Define vulnerabilities to test
7vulnerabilities = [
8 Bias(types=["race", "gender", "political"]),
9 Toxicity(types=["profanity", "insults", "threats"]),
10 PIILeakage(types=["direct disclosure", "api and database access"])
11]
12
13# Define attack methods
14attacks = [
15 PromptInjection(weight=2), # Higher weight = more likely to be selected
16 LinearJailbreaking(weight=1)
17]
4

Run the red team assessment

Execute the red teaming assessment with your configured parameters:

1# Run comprehensive red teaming
2risk_assessment = red_team(
3 model_callback=model_callback,
4 vulnerabilities=vulnerabilities,
5 attacks=attacks,
6 attacks_per_vulnerability_type=3,
7 max_concurrent=5
8)

Best Practices

  1. Start with frameworks: Use OWASP Top 10 or NIST AI RMF for comprehensive coverage
  2. Test early and often: Integrate red teaming into your development cycle
  3. Focus on your use case: Customize vulnerabilities based on your application’s risks
  4. Monitor continuously: Set up ongoing safety assessments for production systems
  5. Document and remediate: Keep detailed records of findings and remediation efforts

Next Steps

Red teaming works seamlessly with your existing LLM evaluation and tracing workflows on Confident AI.