Before starting deployment, review these requirements with your infrastructure and security teams. This page covers:
Understanding these requirements upfront prevents delays caused by missing approvals or insufficient quotas.
Confident AI uses the following technologies. Your organization may require approval before deploying new technologies:
Why this tech stack? PostgreSQL is the application’s source of truth. Redis provides fast caching and manages background job queues. Kubernetes enables reliable, scalable container orchestration. External Secrets keeps credentials in Key Vault (your security team’s preferred location) while making them available to pods.
Technology approval processes: Many enterprises have technology review boards or approved software lists. If PostgreSQL, Kubernetes, or Terraform aren’t already approved in your environment, initiate that process early—it can take weeks.
Default resource configurations for staging and production environments. These represent starting points—adjust based on your expected workload.
AKS worker nodes run your application containers. More nodes = more capacity for concurrent users and evaluations. The autoscaler adds nodes during high load and removes them when idle.
AKS system pool runs Kubernetes system components (CoreDNS, kube-proxy, etc.) on a fixed set of 2 nodes.
PostgreSQL Flexible Server stores all application data. The SKU affects query performance; storage grows as you accumulate data.
Which service is most resource-intensive? The evaluations service
(confident-evals) consumes the most CPU during evaluation runs—it processes
LLM outputs and computes metrics. If evaluations are slow, scale this service
first before adding nodes.
Azure vCPU quotas can block deployment. Azure subscriptions have default limits on vCPUs per VM family. A typical deployment needs ~40 vCPUs for the Dv5 family (2×4 system + 4×8 worker).
Check your quotas before starting:
If your limit is low, request an increase—this can take hours to days.
The deployment provisions the following Azure services:
Some organizations restrict which Azure services can be used. Azure Policy or management group policies may prohibit certain services. Verify the services above are allowed in your subscription before proceeding.
Common restrictions that cause issues:
Confident AI needs to reach external services. Ensure your network allows outbound HTTPS (port 443) to:
Corporate proxies and firewalls: If your organization routes traffic through a proxy or inspects HTTPS, you may need to:
Network restrictions are a common cause of deployment failures that appear as timeouts or SSL errors.
The identity running Terraform needs the following Azure RBAC roles or equivalent permissions:
Terraform creates and manages:
Permissions are a common cause of deployment failures. Most organizations don’t grant broad permissions by default.
Options:
Azure costs vary by region and usage. Approximate monthly costs for always-on infrastructure:
These are estimates. Actual costs depend on:
Use Azure Cost Management after deployment to track actual spending.
Before proceeding to Prerequisites, verify:
Once requirements are understood and approved, proceed to Prerequisites to set up your deployment environment.