Requirements
Overview
Before starting deployment, review these requirements with your infrastructure and security teams. This page covers:
- Technologies that need approval in your environment
- Resource sizing for staging and production
- AWS services that will be provisioned
- IAM permissions required for deployment
- Estimated costs and considerations
Understanding these requirements upfront prevents delays caused by missing approvals or insufficient quotas.
Technologies
Confident AI uses the following technologies. Your organization may require approval before deploying new technologies:
Why this tech stack? PostgreSQL is the application’s source of truth. Redis provides fast caching and manages background job queues. Kubernetes enables reliable, scalable container orchestration. External Secrets keeps credentials in Secrets Manager (your security team’s preferred location) while making them available to pods.
Technology approval processes: Many enterprises have technology review boards or approved software lists. If PostgreSQL, Kubernetes, or Terraform aren’t already approved in your environment, initiate that process early—it can take weeks.
Resource allocation
Default resource configurations for staging and production environments. These represent starting points—adjust based on your expected workload.
Understanding resource sizing
EKS nodes run your application containers. More nodes = more capacity for concurrent users and evaluations. The autoscaler adds nodes during high load and removes them when idle.
RDS stores all application data. The instance class affects query performance; storage grows automatically as you accumulate data.
Which service is most resource-intensive? The evaluations service
(confident-evals) consumes the most CPU during evaluation runs—it processes
LLM outputs and computes metrics. If evaluations are slow, scale this service
first before adding nodes.
EC2 service quotas can block deployment. AWS accounts have default limits on vCPUs. A typical staging deployment needs ~8 vCPUs (2 nodes × 2 vCPU + 2 vCPU for system pods). Production needs more.
Check your quotas before starting:
- AWS Console → Service Quotas → Amazon EC2 → “Running On-Demand Standard instances”
If your limit is low (e.g., 32 vCPUs in a new account), request an increase—this can take hours to days.
AWS services
The deployment provisions the following AWS services:
Some organizations restrict which AWS services can be used. Service Control Policies (SCPs) or internal policies may prohibit certain services. Verify the services above are allowed in your AWS organization before proceeding.
Common restrictions that cause issues:
- NAT Gateway (some orgs require shared NAT infrastructure)
- KMS (some orgs require centrally managed keys)
- IAM role creation (some orgs require pre-provisioned roles)
Outbound network requirements
Confident AI needs to reach external services. Ensure your network allows outbound HTTPS (port 443) to:
Corporate proxies and firewalls: If your organization routes traffic through a proxy or inspects HTTPS, you may need to:
- Allowlist the domains above
- Configure proxy settings in the deployment
- Get certificate exceptions for HTTPS inspection
Network restrictions are a common cause of deployment failures that appear as timeouts or SSL errors.
IAM permissions
The IAM user or role running Terraform needs permissions to create and manage:
- VPC, subnets, route tables, gateways
- EKS clusters and node groups
- RDS instances and subnet groups
- S3 buckets and policies
- IAM roles, policies, and OIDC providers
- Secrets Manager secrets
- ACM certificates
- EC2 instances and security groups
- KMS keys
Why so many permissions? Terraform creates a complete, self-contained infrastructure. It needs permission to create all the pieces. After deployment, ongoing operations need far fewer permissions.
IAM permissions are the #1 cause of deployment failures. Most organizations don’t grant broad permissions by default.
Options:
- Use AdministratorAccess temporarily — Simplest for initial deployment. Restrict after success.
- Request specific permissions — Work with your cloud security team to create a deployment role with the permissions above.
- Have a platform team deploy — If you can’t get IAM permissions, have someone who does run Terraform.
IAM permission errors look like: Error: creating IAM Role: AccessDenied
Estimated costs
AWS costs vary by region and usage. Approximate monthly costs for always-on infrastructure:
These are estimates. Actual costs depend on:
- Region (us-east-1 is typically cheapest)
- Usage (more evaluations = more compute = higher cost)
- Data volume (RDS storage, S3 objects)
- Reserved instances (can reduce EC2/RDS costs 30-50%)
Use AWS Cost Explorer after deployment to track actual spending.
Pre-deployment checklist
Before proceeding to Prerequisites, verify:
- Technologies listed above are approved for use
- AWS services above can be provisioned (no SCP blocks)
- IAM permissions available or obtainable
- EC2 service quotas sufficient for desired node count
- Outbound network access available or exceptions requested
- Budget approved for estimated costs
- Security team aware of deployment plan
Next steps
Once requirements are understood and approved, proceed to Prerequisites to set up your deployment environment.