Confident AI’s red teaming capabilities offer a variety of features to test AI safety and security in development for a pre-deployment workflow, offering a wide range of features for:
All vulnerabilities and attacks on DeepTeam are also available on Confident AI.
deepteam with full control over vulnerabilities and attacksSuitable for: Python users, development, and pre-deployment security workflows
Suitable for: Non-python users, continuous monitoring, and production safety assessments
This examples goes through a comprehensive safety assessment using adversarial attacks to identify vulnerabilities in your AI system.
You’ll need to get your API key as shown in the setup and installation section before continuing.
Running red teaming locally executes attacks on your machine and uploads results to Confident AI. This gives full control over custom vulnerabilities and attack methods.
Set your Confident AI API key so results are uploaded to the platform:
Or set it as an environment variable:
Define your AI system as a model callback function. This is the AI application you want to red team:
The model callback must accept a single string parameter (the adversarial
input), and return an
RTTurn
object with role as assistant and content being your AI system’s response.
You can also pass retrieval_context and tools_called in your RTTurn
object when testing RAG or agentic systems. retrieval_context can be a list
of strings and tools_called must be a list of ToolCall objects.
Choose which vulnerabilities to test for and which attack methods to use:
Execute the red teaming assessment with your configured parameters:
This will run red teaming on your model_callback using the configured vulnerabilities and attacks and generate a risk assessment which is printed onto your console and also uploaded to the Confident AI platform. You can now view these risk assessments in the Risk Profile section on the Confident AI paltform.
You need to run deepteam login command from the CLI or save your API key as
CONFIDENT_API_KEY in your env for your risk assessments to be uploaded to
the Confident AI platform.
Instead of configuring vulnerabilities and attacks manually, you can use pre-defined security frameworks like OWASP, NIST AI RMF, and MITRE ATLAS via deepteam:
Results will be posted to the Confident AI platform automatically.
These are the same frameworks available in the no-code workflow (OWASP, NIST, MITRE ATLAS), but used programmatically. Note that code-driven assessments do not support the cloud framework builder or CVSS scoring.
Use industry-standard frameworks like OWASP Top 10 and NIST AI RMF for comprehensive security assessments
Create custom vulnerabilities and attack methods tailored to your specific use case and industry requirements
Red teaming works seamlessly with your existing LLM evaluation and tracing workflows on Confident AI.