Configuration | Confident AI Docs

Overview

This step configures all the variables that Terraform uses to provision your infrastructure. You will:

Copy an environment template (staging or production)
Configure VPC settings (new or existing VPC)
Set EKS node sizing and scaling parameters
Configure RDS database settings
Provide domain URLs and authentication secrets
Set up ECR cross-account access credentials
Configure the Terraform state backend

After completing this page, your terraform.tfvars file will contain all values needed to provision infrastructure.

How Terraform configuration works

Terraform uses variables to customize deployments. Instead of editing the Terraform code directly, you provide values in a terraform.tfvars file. This keeps your configuration separate from the code, making updates easier.

The repository includes template files with sensible defaults. You copy a template and fill in your specific values.

Setup

Navigate to the AWS Terraform directory:

$ cd confident-terraform/aws_tf

Copy the appropriate environment template:

$ # For staging/development environments
$ cp vars/staging.vars terraform.tfvars
$ 
$ # For production environments
$ cp vars/production.vars terraform.tfvars

What’s the difference? The staging template uses smaller instance sizes and fewer nodes to reduce costs. The production template uses larger instances and more replicas for reliability. You can adjust these after copying.

Open terraform.tfvars in your editor. The following sections explain each variable group.

Environment identification

These variables name and identify your deployment:

1 confident_application_name = "confidentai"
2 confident_environment      = "stage"  # or "prod"
3 confident_aws_region       = "us-east-1"

Variable	What it does
`confident_application_name`	Prefix for all AWS resource names (e.g., `confidentai-stage-eks`)
`confident_environment`	Must be exactly `stage` or `prod`—used in resource names and affects some defaults
`confident_aws_region`	AWS region where everything deploys

Region selection matters. Choose a region close to your users and compliant with your data residency requirements. Once deployed, you cannot easily change regions—it requires a full redeployment.

Organization region restrictions: Some organizations only allow deployments in specific regions. Verify your region is approved before proceeding.

VPC configuration

Option A: Create a new VPC (recommended)

If you’re creating a new VPC, configure the IP address ranges:

1 confident_vpc_enabled                = true
2 confident_vpc_cidr_block             = "10.0.0.0/16"
3 confident_availability_zones         = ["us-east-1a", "us-east-1b"]
4 confident_private_subnet_cidr_blocks = ["10.0.1.0/24", "10.0.2.0/24"]
5 confident_public_subnet_cidr_blocks  = ["10.0.101.0/24", "10.0.102.0/24"]

Understanding CIDR blocks:

CIDR notation defines IP address ranges. 10.0.0.0/16 means “all addresses from 10.0.0.0 to 10.0.255.255” (65,536 addresses). The /16 indicates how many bits are fixed.

Setting	Default	What it means
`confident_vpc_cidr_block`	`10.0.0.0/16`	The overall IP range for the VPC
`confident_private_subnet_cidr_blocks`	`10.0.1.0/24`, `10.0.2.0/24`	256 IPs each for EKS worker nodes
`confident_public_subnet_cidr_blocks`	`10.0.101.0/24`, `10.0.102.0/24`	256 IPs each for load balancer

CIDR conflicts cause connectivity failures. If your corporate network uses the same IP range (e.g., 10.0.x.x), you’ll have problems connecting via VPN. Common conflict-free alternatives:

172.16.0.0/16 (172.16.x.x)
192.168.0.0/16 (192.168.x.x)
10.100.0.0/16 (10.100.x.x)

Check with your network team before choosing.

Why two availability zones?

EKS requires subnets in at least two AZs for high availability. If one AZ has an outage, your workloads continue running in the other. The defaults use us-east-1a and us-east-1b—change these if using a different region.

Option B: Use an existing VPC

If deploying into an existing VPC, disable VPC creation and provide the existing resource IDs:

1 confident_vpc_enabled              = false
2 external_vpc_id                    = "vpc-0abc123def456789"
3 external_vpc_cidr_block            = "10.0.0.0/16"
4 external_private_subnet_ids        = ["subnet-private1", "subnet-private2"]
5 external_public_subnet_ids         = ["subnet-public1", "subnet-public2"]
6 external_private_route_table_ids   = ["rtb-private1", "rtb-private2"]
7 external_database_subnet_ids       = ["subnet-db1", "subnet-db2"]

Using an existing VPC requires coordination with your network team. You need:

Subnet IDs that have available IP addresses
Route tables that allow outbound internet access (for pulling images)
Security groups that don’t block required traffic
Proper tagging for EKS (see Prerequisites page)

Many existing VPCs have restrictive Network ACLs or missing NAT Gateways that will cause deployment failures.

EKS node configuration

These settings control the EC2 instances that run your Kubernetes workloads:

1 confident_node_instance_types     = ["m6i.xlarge"]
2 confident_node_group_min_size     = 2
3 confident_node_group_max_size     = 8
4 confident_node_group_desired_size = 4

Variable	What it controls
`confident_node_instance_types`	EC2 instance type—determines CPU and memory per node
`confident_node_group_min_size`	Cluster autoscaler won’t go below this
`confident_node_group_max_size`	Cluster autoscaler won’t exceed this
`confident_node_group_desired_size`	How many nodes to start with

Recommended sizes:

Environment	Instance Type	vCPU	Memory	Min/Max Nodes
Staging	`m6i.large`	2	8 GB	2-4
Production	`m6i.xlarge`	4	16 GB	2-8
High volume	`m6i.2xlarge`	8	32 GB	4-12

Instance type availability varies by region. If you get errors about unavailable instance types during provisioning, check which instance types are available in your region and AZs.

EC2 service quotas can block deployment. AWS accounts have default limits on how many vCPUs you can run. If you’ve never used EKS before, you may hit these limits.

Check your quotas: AWS Console → Service Quotas → Amazon EC2 → “Running On-Demand Standard instances”

Request an increase if your limit is below: (desired_size × vCPUs per instance)

Database configuration

RDS PostgreSQL settings:

1 confident_psql_instance_class       = "db.t4g.large"
2 confident_rds_allocated_storage     = 20
3 confident_rds_max_allocated_storage = 100
4 confident_psql_db_name              = "confident_db"
5 confident_psql_username             = "confident_admin"

Variable	What it controls
`confident_psql_instance_class`	Database instance size—affects performance
`confident_rds_allocated_storage`	Initial storage in GB
`confident_rds_max_allocated_storage`	Maximum storage (auto-scales up to this)
`confident_psql_db_name`	Name of the database created
`confident_psql_username`	Master username (password is auto-generated)

You don’t set the database password. Terraform generates a secure password and stores it in AWS Secrets Manager with automatic rotation every 15 days. This is more secure than static passwords.

RDS instance class affects cost significantly. db.t4g.large costs ~ $100/month while `db.r6g.xlarge` costs ~$ 400/month. Start with the recommended size and upgrade based on actual performance needs.

Domain and URL configuration

1 confident_frontend_url = "https://app.yourdomain.com"
2 confident_backend_url  = "https://api.yourdomain.com"
3 confident_subdomain    = "yourdomain.com"

These URLs configure where Confident AI is accessible and how authentication cookies work:

Variable	Purpose	Example
`confident_frontend_url`	Full URL users type in browser	`https://app.confidentai.acme.com`
`confident_backend_url`	Full URL for API calls	`https://api.confidentai.acme.com`
`confident_subdomain`	Root domain for auth cookies	`acme.com`

The subdomain must be the root domain, not a subdomain.

Correct: confident_subdomain = "acme.com"
Wrong: confident_subdomain = "confidentai.acme.com"

Authentication cookies are set on the subdomain and must be accessible by both frontend and backend. If you use the full subdomain, cookies won’t work correctly.

Authentication secrets

1 confident_better_auth_secret          = "<your-generated-secret>"
2 confident_better_auth_trusted_origins = "https://app.yourdomain.com"
3 confident_google_client_id            = "<google-oauth-client-id>"
4 confident_google_client_secret        = "<google-oauth-client-secret>"

Variable	What it’s for
`confident_better_auth_secret`	Encrypts authentication tokens—use the value you generated in Prerequisites
`confident_better_auth_trusted_origins`	URLs allowed to make authenticated requests—typically your frontend URL
`confident_google_client_id`	Google OAuth Client ID (if using Google SSO)
`confident_google_client_secret`	Google OAuth Client Secret (if using Google SSO)

Trusted origins must include the protocol. Use https://app.yourdomain.com not app.yourdomain.com. Missing protocol causes authentication to fail silently.

External services

1 openai_api_key                = "<your-openai-api-key>"
2 confident_clickhouse_password = "<your-generated-password>"
3 argocd_admin_password         = "<your-generated-password>"

Variable	What it’s for
`openai_api_key`	API key for running LLM evaluations
`confident_clickhouse_password`	Password for the analytics database
`argocd_admin_password`	Admin password for the ArgoCD GitOps dashboard

OpenAI API key requires sufficient quota. Evaluations can consume significant tokens. Ensure your OpenAI account has appropriate rate limits and spending caps configured.

ECR cross-account access

These credentials allow your EKS cluster to pull Confident AI container images:

1 ecr_aws_access_key_id     = "<provided-by-confident-ai>"
2 ecr_aws_secret_access_key = "<provided-by-confident-ai>"
3 ecr_aws_account_id        = "<provided-by-confident-ai>"
4 ecr_aws_region            = "us-east-1"

These values are provided by your Confident AI representative. Don’t modify them unless instructed.

Terraform state backend

Terraform tracks what resources it created in a “state file.” This should be stored remotely so multiple team members can collaborate and state isn’t lost if your laptop dies.

Edit provider.tf to configure your S3 backend:

1 terraform {
2   backend "s3" {
3     bucket       = "your-company-terraform-state"
4     region       = "us-east-1"
5     key          = "confident-ai/staging/terraform.tfstate"
6     use_lockfile = true
7     encrypt      = true
8   }
9 }

Setting	What it does
`bucket`	S3 bucket name for state storage
`key`	Path within the bucket (use different paths for staging vs. production)
`use_lockfile`	Prevents concurrent modifications
`encrypt`	Encrypts state at rest

If the bucket doesn’t exist, create it:

$ aws s3 mb s3://your-company-terraform-state --region us-east-1
$ 
$ aws s3api put-bucket-versioning \
>   --bucket your-company-terraform-state \
>   --versioning-configuration Status=Enabled

Your organization may have existing Terraform state infrastructure. Many companies have:

Centralized state buckets managed by a platform team
Required bucket naming conventions
DynamoDB tables for state locking
Required bucket policies or encryption settings

Check with your infrastructure team before creating a new bucket.

Never delete or modify the state file manually. Terraform state tracks the mapping between your configuration and real AWS resources. Corrupting it can cause Terraform to lose track of resources, leading to orphaned infrastructure or accidental deletions.

Security review checklist

Before proceeding, verify these security considerations:

terraform.tfvars is in .gitignore (never commit secrets)
State bucket has versioning enabled (for recovery from mistakes)
State bucket is encrypted at rest
IAM credentials used have least-privilege permissions
CIDR blocks don’t conflict with corporate network
OpenAI API key has appropriate spending limits

$ # Add tfvars to gitignore if not already present
$ echo "terraform.tfvars" >> .gitignore
$ echo "*.tfvars" >> .gitignore

Next steps

Once configuration is complete, proceed to Provisioning to create the AWS infrastructure.