For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Trust CenterStatusSupportGet a demoPlatform
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingChangelog
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingChangelog
    • Self-Hosting
    • Security & Compliance
  • AWS Deployment
    • Overview
    • Quickstart
    • Requirements
  • Azure Deployment
    • Overview
    • Quickstart
    • Requirements
  • GCP Deployment
    • Overview
    • Quickstart
    • Requirements
      • Prerequisites
      • Configuration
      • Provisioning
      • TLS Certificates
      • Cluster Access
      • Kubernetes Deployment
      • Verification
LogoLogo
Trust CenterStatusSupportGet a demoPlatform
On this page
  • Overview
  • How Terraform configuration works
  • Setup
  • Environment identification
  • VPC configuration
  • Option A: Create a new VPC (recommended)
  • Option B: Use an existing VPC
  • GKE node configuration
  • Database configuration
  • Domain and URL configuration
  • Authentication secrets
  • External services
  • Resource naming
  • GKE access configuration
  • Storage configuration
  • ClickHouse configuration
  • ClickHouse backup
  • ECR cross-account access
  • Terraform state backend
  • Security review checklist
  • Next steps
GCP DeploymentStep-by-step guide

Configuration

Was this page helpful?
Previous

Provisioning

Next
Built with

Overview

This step configures all the variables that Terraform uses to provision your infrastructure. You will:

  • Copy an environment template (staging or production)
  • Configure VPC settings (new or existing VPC)
  • Set GKE node sizing and scaling parameters
  • Configure Cloud SQL for PostgreSQL settings
  • Provide domain URLs and authentication secrets
  • Set up ECR cross-account access credentials
  • Configure the Terraform state backend

After completing this page, your terraform.tfvars file will contain all values needed to provision infrastructure.

How Terraform configuration works

Terraform uses variables to customize deployments. Instead of editing the Terraform code directly, you provide values in a terraform.tfvars file. This keeps your configuration separate from the code, making updates easier.

The repository includes template files with sensible defaults. You copy a template and fill in your specific values.

Setup

Navigate to the GCP Terraform directory:

$cd confident-terraform/gcp

Copy the appropriate environment template:

$# For staging/development environments
$cp vars/staging.vars terraform.tfvars
$
$# For production environments
$cp vars/production.vars terraform.tfvars

What’s the difference? Both templates use the same default instance sizes. The key differences are the environment name (stage vs prod), which affects resource naming. You can adjust all values after copying.

Open terraform.tfvars in your editor. The following sections explain each variable group.

Environment identification

These variables name and identify your deployment:

1confident_application_name = "confidentai"
2confident_environment = "stage" # or "prod"
3confident_gcp_project_id = "<your-project-id>"
4confident_gcp_region = "us-central1"
5confident_gcp_zone = "us-central1-a"
VariableWhat it does
confident_application_namePrefix for all GCP resource names (e.g., confidentai-stage-gke)
confident_environmentMust be exactly stage or prod—used in resource names and affects some defaults
confident_gcp_project_idGCP project where everything deploys
confident_gcp_regionGCP region where everything deploys
confident_gcp_zonePrimary zone within the region for zonal resources

Region selection matters. Choose a region close to your users and compliant with your data residency requirements. Once deployed, you cannot easily change regions—it requires a full redeployment.

Organization region restrictions: Some organizations only allow deployments in specific regions (via Org Policy gcp.resourceLocations). Verify your region is approved before proceeding.

VPC configuration

Option A: Create a new VPC (recommended)

If you’re creating a new VPC, configure the address spaces:

1confident_vpc_enabled = true
2confident_vpc_address_space = "10.0.0.0/16"
3confident_gke_subnet_cidr = "10.0.1.0/24"
4confident_gke_pods_cidr = "10.4.0.0/14"
5confident_gke_services_cidr = "10.0.32.0/20"
6confident_database_psa_cidr = "10.0.6.0/24"
7confident_public_subnet_cidr = "10.0.101.0/24"
8confident_private_endpoint_subnet_cidr = "10.0.7.0/24"
SettingDefaultWhat it means
confident_vpc_address_space10.0.0.0/16The overall IP range for the VPC
confident_gke_subnet_cidr10.0.1.0/24256 IPs for GKE nodes
confident_gke_pods_cidr10.4.0.0/14Secondary range for VPC-native pod IPs
confident_gke_services_cidr10.0.32.0/20Secondary range for cluster service IPs
confident_database_psa_cidr10.0.6.0/24PSA allocation for Cloud SQL
confident_public_subnet_cidr10.0.101.0/24256 IPs for public-facing resources
confident_private_endpoint_subnet_cidr10.0.7.0/24256 IPs for PSC endpoints

CIDR conflicts cause connectivity failures. If your corporate network uses the same IP range (e.g., 10.0.x.x), you’ll have problems connecting via VPN. Common conflict-free alternatives:

  • 172.16.0.0/16 (172.16.x.x)
  • 192.168.0.0/16 (192.168.x.x)
  • 10.100.0.0/16 (10.100.x.x)

Check with your network team before choosing.

Option B: Use an existing VPC

If deploying into an existing VPC, disable VPC creation and provide the existing resource IDs:

1confident_vpc_enabled = false
2external_vpc_id = "projects/<project-id>/global/networks/<vpc-name>"
3external_vpc_address_space = "10.0.0.0/16"
4external_gke_subnet_id = "projects/<project-id>/regions/<region>/subnetworks/<subnet-name>"
5external_database_psa_range_name = "<existing-psa-range-name>"
6external_public_subnet_id = "projects/<project-id>/regions/<region>/subnetworks/<subnet-name>"

Using an existing VPC requires coordination with your network team. You need:

  • Subnet IDs with available IP addresses and secondary ranges for pods/services
  • An existing Private Service Access allocation for Cloud SQL
  • Firewall rules that don’t block required traffic
  • Cloud NAT or outbound internet access for pulling images

Many existing VPCs have restrictive firewall rules or missing Cloud NAT configurations that will cause deployment failures.

GKE node configuration

These settings control the VMs that run your Kubernetes workloads:

1confident_node_machine_type = "n2-standard-8"
2confident_node_group_min_size = 2
3confident_node_group_max_size = 8
4confident_node_group_desired_size = 4
5confident_kubernetes_version = "1.31"
VariableWhat it controls
confident_node_machine_typeMachine type—determines CPU and memory per worker node
confident_node_group_min_sizeCluster autoscaler won’t go below this
confident_node_group_max_sizeCluster autoscaler won’t exceed this
confident_node_group_desired_sizeHow many worker nodes to start with
confident_kubernetes_versionGKE Kubernetes version

Recommended sizes:

EnvironmentMachine TypevCPUMemoryMin/Max Nodes
Stagingn2-standard-8832 GB2-8
Productionn2-standard-8832 GB2-8
High volumen2-standard-161664 GB4-12

GKE also creates a fixed system pool with 2x n2-standard-4 nodes for Kubernetes system components. This is separate from the worker pool configured above.

GCP CPU quotas can block deployment. GCP projects have default limits on how many CPUs you can run per VM family per region.

Check your quotas: GCP Console → IAM & Admin → Quotas → Filter by “N2 CPUs” in your region

Request an increase if your limit is below: (system_nodes × 4) + (desired_worker_nodes × vCPUs per node)

Database configuration

Cloud SQL for PostgreSQL settings:

1confident_psql_tier = "db-custom-4-16384"
2confident_psql_disk_size_gb = 64
3confident_psql_db_name = "confident_db"
4confident_psql_username = "confident_admin"
5confident_psql_password = "<your-generated-password>"
6confident_psql_version = "POSTGRES_17"
7confident_psql_backup_retention_days = 7
8confident_psql_point_in_time_recovery = true
9confident_psql_high_availability = true
VariableWhat it controls
confident_psql_tierCloud SQL tier—determines CPU and memory
confident_psql_disk_size_gbStorage size in GB
confident_psql_db_nameName of the database created
confident_psql_usernameAdministrator username
confident_psql_passwordAdministrator password
confident_psql_versionPostgreSQL version
confident_psql_backup_retention_daysNumber of days to retain automated backups (1-365)
confident_psql_point_in_time_recoveryEnable write-ahead log for PITR
confident_psql_high_availabilityEnable regional HA (standby in another zone)

Generate a strong database password. Use openssl rand -base64 24 to create a secure random password. This value is stored in Google Secret Manager by Terraform. Do not commit it to version control.

Regional HA is enabled by default. This creates a standby replica in a different zone for automatic failover. Disable it for development/testing environments to reduce costs.

Domain and URL configuration

1confident_frontend_url = "https://app.yourdomain.com"
2confident_backend_url = "https://api.yourdomain.com"
3confident_subdomain = "yourdomain.com"
VariablePurposeExample
confident_frontend_urlFull URL users type in browserhttps://app.confidentai.acme.com
confident_backend_urlFull URL for API callshttps://api.confidentai.acme.com
confident_subdomainRoot domain for auth cookiesacme.com

The subdomain must be the root domain, not a subdomain.

Correct: confident_subdomain = "acme.com" Wrong: confident_subdomain = "confidentai.acme.com"

Authentication cookies are set on the subdomain and must be accessible by both frontend and backend. If you use the full subdomain, cookies won’t work correctly.

Authentication secrets

1confident_better_auth_secret = "<your-generated-secret>"
2confident_better_auth_trusted_origins = "https://app.yourdomain.com"
3confident_google_client_id = "<google-oauth-client-id>"
4confident_google_client_secret = "<google-oauth-client-secret>"
VariableWhat it’s for
confident_better_auth_secretEncrypts authentication tokens—use the value you generated in Prerequisites
confident_better_auth_trusted_originsURLs allowed to make authenticated requests—typically your frontend URL
confident_google_client_idGoogle OAuth Client ID (if using Google SSO)
confident_google_client_secretGoogle OAuth Client Secret (if using Google SSO)

Trusted origins must include the protocol. Use https://app.yourdomain.com not app.yourdomain.com. Missing protocol causes authentication to fail silently.

External services

1openai_api_key = "<your-openai-api-key>"
2confident_clickhouse_password = "<your-generated-password>"
3argocd_admin_password = "<your-generated-password>"
VariableWhat it’s for
openai_api_keyAPI key for running LLM evaluations
confident_clickhouse_passwordPassword for the analytics database
argocd_admin_passwordAdmin password for the ArgoCD GitOps dashboard

OpenAI API key requires sufficient quota. Evaluations can consume significant tokens. Ensure your OpenAI account has appropriate rate limits and spending caps configured.

Resource naming

These variables control internal naming conventions for GCP resources. The defaults are suitable for most deployments:

1confident_application_code = "cai"
2confident_environment_code = "s" # "s" for stage, "p" for prod
3confident_region_prefix = "gcuc"
4confident_labels = {
5 environment = "stage"
6 project = "confidentai"
7}
VariableWhat it controls
confident_application_codeShort code used in resource identifiers (e.g., node pool name)
confident_environment_codes for stage, p for prod—used in resource identifiers
confident_region_prefixRegion abbreviation for resource identifiers
confident_labelsLabels applied to all GCP resources

GKE access configuration

1confident_public_gke = false
2confident_gke_admin_group_emails = []
VariableWhat it controls
confident_public_gkeWhen true: GKE API is publicly accessible, NGINX Ingress is internet-facing. When false (default): private GKE cluster with internal load balancers.
confident_gke_admin_group_emailsGoogle Group emails to grant GKE cluster admin access

Public GKE is only recommended for testing. Setting confident_public_gke = true makes the GKE API server and ingress accessible from the internet. Never use this in production.

Storage configuration

1confident_test_cases_bucket = "testcases"
2confident_payloads_bucket = "payloads"
VariableWhat it controls
confident_test_cases_bucketName suffix for the test cases GCS bucket
confident_payloads_bucketName suffix for the payloads GCS bucket

Bucket names are constructed as <application_name>-<environment>-<suffix> (e.g., confidentai-stage-testcases).

ClickHouse configuration

ClickHouse serves as the analytics database, deployed via the ClickHouse Operator on GKE:

1confident_clickhouse_user = "default"
2confident_clickhouse_password = "<your-generated-password>"
3confident_clickhouse_database = "confident_db"
4confident_clickhouse_operator_version = "0.0.1"
5confident_clickhouse_version = "25.12"
6confident_clickhouse_shards = 1
7confident_clickhouse_replicas = 2
8confident_clickhouse_storage_size = "500Gi"
9confident_clickhouse_storage_class = "clickhouse-premium"
VariableWhat it controls
confident_clickhouse_userClickHouse username (keep as default)
confident_clickhouse_passwordClickHouse password
confident_clickhouse_databaseDatabase name within ClickHouse
confident_clickhouse_operator_versionHelm chart version for the ClickHouse Operator
confident_clickhouse_versionClickHouse server version
confident_clickhouse_shardsNumber of shards in the cluster
confident_clickhouse_replicasReplicas per shard (2 recommended for HA)
confident_clickhouse_storage_sizePersistent volume size per ClickHouse pod
confident_clickhouse_storage_classStorageClass for ClickHouse PVCs (GCP pd-ssd persistent disks)

Do not change confident_clickhouse_user from default. The ClickHouse Operator expects this username. Changing it will cause connectivity failures.

ClickHouse backup

A GCS bucket is created for ClickHouse backups:

1confident_clickhouse_backup_bucket = "chbackups"
2confident_clickhouse_backup_schedule = "0 2 * * *"
VariableWhat it controls
confident_clickhouse_backup_bucketName suffix for the ClickHouse backup bucket
confident_clickhouse_backup_scheduleCron schedule for automated backups (UTC)

Backup lifecycle policy: Terraform configures a lifecycle policy that automatically deletes ClickHouse backup objects older than 30 days and snapshots older than 7 days.

ECR cross-account access

These credentials allow your GKE cluster to pull Confident AI container images from AWS ECR:

1ecr_aws_access_key_id = "<provided-by-confident-ai>"
2ecr_aws_secret_access_key = "<provided-by-confident-ai>"
3ecr_aws_account_id = "<provided-by-confident-ai>"
4ecr_aws_region = "us-east-1"

These values are provided by your Confident AI representative. Don’t modify them unless instructed.

Terraform state backend

Terraform tracks what resources it created in a “state file.” This should be stored remotely so multiple team members can collaborate and state isn’t lost.

Edit provider.tf to configure your GCS backend:

1terraform {
2 backend "gcs" {
3 bucket = "your-company-tfstate"
4 prefix = "confident-ai/staging"
5 }
6}
SettingWhat it does
bucketGCS bucket name for state storage
prefixPath within the bucket (use different prefixes for staging vs. production)

If the bucket doesn’t exist, create it:

$gcloud storage buckets create gs://your-company-tfstate \
> --project=<your-project-id> \
> --location=us-central1 \
> --uniform-bucket-level-access
$
$gcloud storage buckets update gs://your-company-tfstate \
> --versioning

Your organization may have existing Terraform state infrastructure. Many companies have:

  • Centralized state buckets managed by a platform team
  • Required naming conventions
  • Required encryption settings (CMEK)
  • Org Policy requirements on GCS buckets

Check with your infrastructure team before creating a new bucket.

Never delete or modify the state file manually. Terraform state tracks the mapping between your configuration and real GCP resources. Corrupting it can cause Terraform to lose track of resources, leading to orphaned infrastructure or accidental deletions.

Security review checklist

Before proceeding, verify these security considerations:

  • terraform.tfvars is in .gitignore (never commit secrets)
  • State bucket has versioning enabled (for recovery from mistakes)
  • State bucket is encrypted at rest (default with Google-managed keys)
  • Identity used has least-privilege permissions
  • CIDR blocks don’t conflict with corporate network
  • OpenAI API key has appropriate spending limits
$# Add tfvars to gitignore if not already present
$echo "terraform.tfvars" >> .gitignore
$echo "*.tfvars" >> .gitignore

Next steps

Once configuration is complete, proceed to Provisioning to create the GCP infrastructure.