Configuration

Overview

This step configures all the variables that Terraform uses to provision your infrastructure. You will:

  • Copy an environment template (staging or production)
  • Configure VNet settings (new or existing VNet)
  • Set AKS node sizing and scaling parameters
  • Configure PostgreSQL Flexible Server settings
  • Provide domain URLs and authentication secrets
  • Set up ECR cross-account access credentials
  • Configure the Terraform state backend

After completing this page, your terraform.tfvars file will contain all values needed to provision infrastructure.

How Terraform configuration works

Terraform uses variables to customize deployments. Instead of editing the Terraform code directly, you provide values in a terraform.tfvars file. This keeps your configuration separate from the code, making updates easier.

The repository includes template files with sensible defaults. You copy a template and fill in your specific values.

Setup

Navigate to the Azure Terraform directory:

$cd confident-terraform/azure

Copy the appropriate environment template:

$# For staging/development environments
$cp vars/staging.vars terraform.tfvars
$
$# For production environments
$cp vars/production.vars terraform.tfvars

What’s the difference? Both templates use the same default instance sizes. The key differences are the environment name (stage vs prod), which affects resource naming. You can adjust all values after copying.

Open terraform.tfvars in your editor. The following sections explain each variable group.

Environment identification

These variables name and identify your deployment:

1confident_application_name = "confidentai"
2confident_environment = "stage" # or "prod"
3confident_azure_subscription_id = "<your-subscription-id>"
4confident_azure_region = "eastus"
5confident_resource_group_name = "confident"
VariableWhat it does
confident_application_namePrefix for all Azure resource names (e.g., confidentai-stage-aks)
confident_environmentMust be exactly stage or prod—used in resource names and affects some defaults
confident_azure_subscription_idAzure subscription where everything deploys
confident_azure_regionAzure region where everything deploys
confident_resource_group_nameSuffix for the resource group name

Region selection matters. Choose a region close to your users and compliant with your data residency requirements. Once deployed, you cannot easily change regions—it requires a full redeployment.

Organization region restrictions: Some organizations only allow deployments in specific regions. Verify your region is approved before proceeding.

VNet configuration

If you’re creating a new VNet, configure the address spaces:

1confident_vnet_enabled = true
2confident_vnet_address_space = "10.0.0.0/16"
3confident_aks_subnet_cidr = "10.0.1.0/24"
4confident_database_subnet_cidr = "10.0.6.0/24"
5confident_public_subnet_cidr = "10.0.101.0/24"
6confident_private_endpoint_subnet_cidr = "10.0.7.0/24"
SettingDefaultWhat it means
confident_vnet_address_space10.0.0.0/16The overall IP range for the VNet
confident_aks_subnet_cidr10.0.1.0/24256 IPs for AKS nodes
confident_database_subnet_cidr10.0.6.0/24256 IPs for PostgreSQL (delegated subnet)
confident_public_subnet_cidr10.0.101.0/24256 IPs for public-facing resources
confident_private_endpoint_subnet_cidr10.0.7.0/24256 IPs for Storage private endpoint

CIDR conflicts cause connectivity failures. If your corporate network uses the same IP range (e.g., 10.0.x.x), you’ll have problems connecting via VPN. Common conflict-free alternatives:

  • 172.16.0.0/16 (172.16.x.x)
  • 192.168.0.0/16 (192.168.x.x)
  • 10.100.0.0/16 (10.100.x.x)

Check with your network team before choosing.

Option B: Use an existing VNet

If deploying into an existing VNet, disable VNet creation and provide the existing resource IDs:

1confident_vnet_enabled = false
2external_vnet_id = "/subscriptions/.../resourceGroups/.../providers/Microsoft.Network/virtualNetworks/..."
3external_vnet_address_space = "10.0.0.0/16"
4external_aks_subnet_id = "/subscriptions/.../resourceGroups/.../providers/Microsoft.Network/virtualNetworks/.../subnets/..."
5external_database_subnet_id = "/subscriptions/.../resourceGroups/.../providers/Microsoft.Network/virtualNetworks/.../subnets/..."
6external_public_subnet_id = "/subscriptions/.../resourceGroups/.../providers/Microsoft.Network/virtualNetworks/.../subnets/..."

Using an existing VNet requires coordination with your network team. You need:

  • Subnet IDs with available IP addresses
  • The database subnet must have Microsoft.DBforPostgreSQL/flexibleServers service delegation
  • NSGs that don’t block required traffic
  • NAT Gateway or outbound internet access for pulling images

Many existing VNets have restrictive NSGs or missing NAT Gateways that will cause deployment failures.

AKS node configuration

These settings control the VMs that run your Kubernetes workloads:

1confident_node_vm_size = "Standard_D8s_v5"
2confident_node_group_min_size = 2
3confident_node_group_max_size = 8
4confident_node_group_desired_size = 4
5confident_kubernetes_version = "1.31"
VariableWhat it controls
confident_node_vm_sizeVM size—determines CPU and memory per worker node
confident_node_group_min_sizeCluster autoscaler won’t go below this
confident_node_group_max_sizeCluster autoscaler won’t exceed this
confident_node_group_desired_sizeHow many worker nodes to start with
confident_kubernetes_versionAKS Kubernetes version

Recommended sizes:

EnvironmentVM SizevCPUMemoryMin/Max Nodes
StagingStandard_D8s_v5832 GB2-8
ProductionStandard_D8s_v5832 GB2-8
High volumeStandard_D16s_v51664 GB4-12

AKS also creates a fixed system pool with 2x Standard_D4s_v5 nodes for Kubernetes system components. This is separate from the worker pool configured above.

Azure vCPU quotas can block deployment. Azure subscriptions have default limits on how many vCPUs you can run per VM family.

Check your quotas: Azure Portal → Subscriptions → Usage + quotas → Filter by “Standard DSv5 Family”

Request an increase if your limit is below: (system_nodes × 4) + (desired_worker_nodes × vCPUs per node)

Database configuration

PostgreSQL Flexible Server settings:

1confident_psql_sku_name = "GP_Standard_D4s_v3"
2confident_psql_storage_mb = 65536
3confident_psql_db_name = "confident_db"
4confident_psql_username = "confident_admin"
5confident_psql_password = "<your-generated-password>"
6confident_psql_version = "17"
7confident_psql_backup_retention_days = 7
8confident_psql_geo_redundant_backup = false
9confident_psql_high_availability = true
VariableWhat it controls
confident_psql_sku_nameDatabase SKU—determines CPU and memory
confident_psql_storage_mbMaximum storage in MB (65536 = 64GB)
confident_psql_db_nameName of the database created
confident_psql_usernameAdministrator username
confident_psql_passwordAdministrator password
confident_psql_versionPostgreSQL version
confident_psql_backup_retention_daysNumber of days to retain automated backups (1-35)
confident_psql_geo_redundant_backupEnable geo-redundant backups for disaster recovery
confident_psql_high_availabilityEnable zone-redundant HA (standby in another zone)

Generate a strong database password. Use openssl rand -base64 24 to create a secure random password. This value is stored in Azure Key Vault by Terraform. Do not commit it to version control.

Zone-redundant HA is enabled by default. This creates a standby replica in a different availability zone for automatic failover. Disable it for development/testing environments to reduce costs.

Domain and URL configuration

1confident_frontend_url = "https://app.yourdomain.com"
2confident_backend_url = "https://api.yourdomain.com"
3confident_subdomain = "yourdomain.com"
VariablePurposeExample
confident_frontend_urlFull URL users type in browserhttps://app.confidentai.acme.com
confident_backend_urlFull URL for API callshttps://api.confidentai.acme.com
confident_subdomainRoot domain for auth cookiesacme.com

The subdomain must be the root domain, not a subdomain.

Correct: confident_subdomain = "acme.com" Wrong: confident_subdomain = "confidentai.acme.com"

Authentication cookies are set on the subdomain and must be accessible by both frontend and backend. If you use the full subdomain, cookies won’t work correctly.

Authentication secrets

1confident_better_auth_secret = "<your-generated-secret>"
2confident_better_auth_trusted_origins = "https://app.yourdomain.com"
3confident_google_client_id = "<google-oauth-client-id>"
4confident_google_client_secret = "<google-oauth-client-secret>"
VariableWhat it’s for
confident_better_auth_secretEncrypts authentication tokens—use the value you generated in Prerequisites
confident_better_auth_trusted_originsURLs allowed to make authenticated requests—typically your frontend URL
confident_google_client_idGoogle OAuth Client ID (if using Google SSO)
confident_google_client_secretGoogle OAuth Client Secret (if using Google SSO)

Trusted origins must include the protocol. Use https://app.yourdomain.com not app.yourdomain.com. Missing protocol causes authentication to fail silently.

External services

1openai_api_key = "<your-openai-api-key>"
2confident_clickhouse_password = "<your-generated-password>"
3argocd_admin_password = "<your-generated-password>"
VariableWhat it’s for
openai_api_keyAPI key for running LLM evaluations
confident_clickhouse_passwordPassword for the analytics database
argocd_admin_passwordAdmin password for the ArgoCD GitOps dashboard

OpenAI API key requires sufficient quota. Evaluations can consume significant tokens. Ensure your OpenAI account has appropriate rate limits and spending caps configured.

Resource naming

These variables control internal naming conventions for Azure resources. The defaults are suitable for most deployments:

1confident_application_code = "cai"
2confident_environment_code = "s" # "s" for stage, "p" for prod
3confident_region_prefix = "azew"
4confident_tags = {
5 Environment = "stage"
6 Project = "ConfidentAI"
7}
VariableWhat it controls
confident_application_codeShort code used in resource identifiers (e.g., node pool name)
confident_environment_codes for stage, p for prod—used in resource identifiers
confident_region_prefixRegion abbreviation for resource identifiers
confident_tagsTags applied to all Azure resources

AKS access configuration

1confident_public_aks = false
2confident_aks_admin_group_object_ids = []
VariableWhat it controls
confident_public_aksWhen true: AKS API is publicly accessible, NGINX Ingress is internet-facing. When false (default): private AKS cluster with internal load balancers.
confident_aks_admin_group_object_idsAzure AD group object IDs to grant AKS cluster admin access

Public AKS is only recommended for testing. Setting confident_public_aks = true makes the AKS API server and ingress accessible from the internet. Never use this in production.

Storage configuration

1confident_test_cases_container = "testcases"
2confident_payloads_container = "payloads"
VariableWhat it controls
confident_test_cases_containerName suffix for the test cases blob container
confident_payloads_containerName suffix for the payloads blob container

Container names are constructed as <application_name>-<environment>-<suffix> (e.g., confidentai-stage-testcases).

ClickHouse configuration

ClickHouse serves as the analytics database, deployed via the ClickHouse Operator on AKS:

1confident_clickhouse_user = "default"
2confident_clickhouse_password = "<your-generated-password>"
3confident_clickhouse_database = "confident_db"
4confident_clickhouse_operator_version = "0.0.1"
5confident_clickhouse_version = "25.12"
6confident_clickhouse_shards = 1
7confident_clickhouse_replicas = 2
8confident_clickhouse_storage_size = "500Gi"
9confident_clickhouse_storage_class = "clickhouse-premium"
VariableWhat it controls
confident_clickhouse_userClickHouse username (keep as default)
confident_clickhouse_passwordClickHouse password
confident_clickhouse_databaseDatabase name within ClickHouse
confident_clickhouse_operator_versionHelm chart version for the ClickHouse Operator
confident_clickhouse_versionClickHouse server version
confident_clickhouse_shardsNumber of shards in the cluster
confident_clickhouse_replicasReplicas per shard (2 recommended for HA)
confident_clickhouse_storage_sizePersistent volume size per ClickHouse pod
confident_clickhouse_storage_classStorageClass for ClickHouse PVCs (Azure Premium_LRS disks)

Do not change confident_clickhouse_user from default. The ClickHouse Operator expects this username. Changing it will cause connectivity failures.

ClickHouse backup

A blob container is created within the Storage Account for ClickHouse backups:

1confident_clickhouse_backup_container = "chbackups"
2confident_clickhouse_backup_schedule = "0 2 * * *"
VariableWhat it controls
confident_clickhouse_backup_containerName suffix for the ClickHouse backup container
confident_clickhouse_backup_scheduleCron schedule for automated backups (UTC)

Backup lifecycle policy: Terraform configures a lifecycle policy that automatically deletes ClickHouse backup blobs older than 30 days and snapshots older than 7 days.

ECR cross-account access

These credentials allow your AKS cluster to pull Confident AI container images from AWS ECR:

1ecr_aws_access_key_id = "<provided-by-confident-ai>"
2ecr_aws_secret_access_key = "<provided-by-confident-ai>"
3ecr_aws_account_id = "<provided-by-confident-ai>"
4ecr_aws_region = "us-east-1"

These values are provided by your Confident AI representative. Don’t modify them unless instructed.

Terraform state backend

Terraform tracks what resources it created in a “state file.” This should be stored remotely so multiple team members can collaborate and state isn’t lost.

Edit provider.tf to configure your Azure Storage backend:

1terraform {
2 backend "azurerm" {
3 resource_group_name = "your-company-tfstate-rg"
4 storage_account_name = "yourcompanytfstate"
5 container_name = "tfstate"
6 key = "confident-ai/staging/terraform.tfstate"
7 }
8}
SettingWhat it does
resource_group_nameResource group containing the state storage account
storage_account_nameStorage account name for state storage
container_nameBlob container for state files
keyPath within the container (use different paths for staging vs. production)

If the storage account doesn’t exist, create it:

$az group create --name your-company-tfstate-rg --location eastus
$
$az storage account create \
> --name yourcompanytfstate \
> --resource-group your-company-tfstate-rg \
> --location eastus \
> --sku Standard_LRS \
> --encryption-services blob
$
$az storage container create \
> --name tfstate \
> --account-name yourcompanytfstate

Your organization may have existing Terraform state infrastructure. Many companies have:

  • Centralized state storage accounts managed by a platform team
  • Required naming conventions
  • Required encryption settings
  • Azure Policy requirements on storage accounts

Check with your infrastructure team before creating a new storage account.

Never delete or modify the state file manually. Terraform state tracks the mapping between your configuration and real Azure resources. Corrupting it can cause Terraform to lose track of resources, leading to orphaned infrastructure or accidental deletions.

Security review checklist

Before proceeding, verify these security considerations:

  • terraform.tfvars is in .gitignore (never commit secrets)
  • State storage account has versioning enabled (for recovery from mistakes)
  • State storage account is encrypted at rest
  • Identity used has least-privilege permissions
  • CIDR blocks don’t conflict with corporate network
  • OpenAI API key has appropriate spending limits
$# Add tfvars to gitignore if not already present
$echo "terraform.tfvars" >> .gitignore
$echo "*.tfvars" >> .gitignore

Next steps

Once configuration is complete, proceed to Provisioning to create the Azure infrastructure.