Requirements
Overview
Before starting deployment, review these requirements with your infrastructure and security teams. This page covers:
- Technologies that need approval in your environment
- Resource sizing for staging and production
- Azure services that will be provisioned
- Permissions required for deployment
- Estimated costs and considerations
Understanding these requirements upfront prevents delays caused by missing approvals or insufficient quotas.
Technologies
Confident AI uses the following technologies. Your organization may require approval before deploying new technologies:
Why this tech stack? PostgreSQL is the application’s source of truth. Redis provides fast caching and manages background job queues. Kubernetes enables reliable, scalable container orchestration. External Secrets keeps credentials in Key Vault (your security team’s preferred location) while making them available to pods.
Technology approval processes: Many enterprises have technology review boards or approved software lists. If PostgreSQL, Kubernetes, or Terraform aren’t already approved in your environment, initiate that process early—it can take weeks.
Resource allocation
Default resource configurations for staging and production environments. These represent starting points—adjust based on your expected workload.
Understanding resource sizing
AKS worker nodes run your application containers. More nodes = more capacity for concurrent users and evaluations. The autoscaler adds nodes during high load and removes them when idle.
AKS system pool runs Kubernetes system components (CoreDNS, kube-proxy, etc.) on a fixed set of 2 nodes.
PostgreSQL Flexible Server stores all application data. The SKU affects query performance; storage grows as you accumulate data.
Which service is most resource-intensive? The evaluations service
(confident-evals) consumes the most CPU during evaluation runs—it processes
LLM outputs and computes metrics. If evaluations are slow, scale this service
first before adding nodes.
Azure vCPU quotas can block deployment. Azure subscriptions have default limits on vCPUs per VM family. A typical deployment needs ~40 vCPUs for the Dv5 family (2×4 system + 4×8 worker).
Check your quotas before starting:
- Azure Portal → Subscriptions → Usage + quotas → Filter by “Standard Dv5 Family”
If your limit is low, request an increase—this can take hours to days.
Azure services
The deployment provisions the following Azure services:
Some organizations restrict which Azure services can be used. Azure Policy or management group policies may prohibit certain services. Verify the services above are allowed in your subscription before proceeding.
Common restrictions that cause issues:
- NAT Gateway (some orgs require shared NAT infrastructure)
- Key Vault (some orgs require centrally managed vaults)
- Managed Identity creation (some orgs require pre-provisioned identities)
- Public IP allocation (some orgs restrict public IPs)
Outbound network requirements
Confident AI needs to reach external services. Ensure your network allows outbound HTTPS (port 443) to:
Corporate proxies and firewalls: If your organization routes traffic through a proxy or inspects HTTPS, you may need to:
- Allowlist the domains above
- Configure proxy settings in the deployment
- Get certificate exceptions for HTTPS inspection
Network restrictions are a common cause of deployment failures that appear as timeouts or SSL errors.
Permissions
The identity running Terraform needs the following Azure RBAC roles or equivalent permissions:
- Contributor on the subscription or target resource group
- User Access Administrator for creating role assignments
- Key Vault Administrator for managing Key Vault secrets
Terraform creates and manages:
- Resource Groups, VNets, subnets, NSGs, NAT Gateways
- AKS clusters and node pools
- PostgreSQL Flexible Servers
- Storage Accounts and containers
- Key Vaults and secrets
- Managed Identities and federated credentials
- Role assignments
Permissions are a common cause of deployment failures. Most organizations don’t grant broad permissions by default.
Options:
- Use Contributor + User Access Administrator temporarily — Simplest for initial deployment. Restrict after success.
- Request specific permissions — Work with your cloud security team to create a deployment service principal with the permissions above.
- Have a platform team deploy — If you can’t get permissions, have someone who does run Terraform.
Estimated costs
Azure costs vary by region and usage. Approximate monthly costs for always-on infrastructure:
These are estimates. Actual costs depend on:
- Region (East US 2 is typically cost-effective)
- Usage (more evaluations = more compute = higher cost)
- Data volume (PostgreSQL storage, blob objects)
- Reserved instances (can reduce VM costs 30-50%)
Use Azure Cost Management after deployment to track actual spending.
Pre-deployment checklist
Before proceeding to Prerequisites, verify:
- Technologies listed above are approved for use
- Azure services above can be provisioned (no Azure Policy blocks)
- Permissions available or obtainable
- vCPU quotas sufficient for desired node count
- Outbound network access available or exceptions requested
- Budget approved for estimated costs
- Security team aware of deployment plan
Next steps
Once requirements are understood and approved, proceed to Prerequisites to set up your deployment environment.