Verification

Overview

You’ve provisioned infrastructure and deployed the application. This final step verifies everything works correctly. You will:

Confirm all infrastructure components (AKS, PostgreSQL, Storage, secrets) are healthy
Configure DNS to point to your load balancer
Test application access via the frontend and backend URLs
Run functional tests (user login, project creation, SDK connectivity)
Check health endpoints for each service
Complete a production readiness checklist

After this step, your deployment is verified and ready for users.

Infrastructure verification

Before testing the application, verify all infrastructure components are healthy.

AKS cluster health

$ kubectl get nodes

All nodes should show Ready:

NAME                                 STATUS   ROLES    AGE    VERSION
aks-system-12345678-vmss000000       Ready    <none>   1h     v1.31.x
aks-system-12345678-vmss000001       Ready    <none>   1h     v1.31.x
aks-azewcais-12345678-vmss000000     Ready    <none>   1h     v1.31.x
aks-azewcais-12345678-vmss000001     Ready    <none>   1h     v1.31.x

Nodes stuck in NotReady? Check for issues:

$ kubectl describe node <node-name>

Look at the “Conditions” section for clues. Common causes: network plugin issues, insufficient resources, or failed health checks.

PostgreSQL connectivity

Verify the database is accessible from the cluster by checking backend logs:

$ kubectl logs deployment/confident-backend -n confident-ai | grep -i database

You should see successful connection messages, not connection refused errors.

Check PostgreSQL status in Azure Portal:

Go to Azure Portal → Azure Database for PostgreSQL → Flexible Servers
Find your server
Status should be “Available”

Key Vault secrets

Verify secrets exist and External Secrets can read them:

$ # Check External Secrets synced to Kubernetes
$ kubectl get secret confident-externalsecret -n confident-ai
$ 
$ # Check Key Vault via CLI
$ az keyvault secret list --vault-name $(terraform output -raw key_vault_name) --query '[].name' -o tsv

Storage Account

Verify the storage account and containers exist:

$ az storage container list \
>   --account-name $(terraform output -raw storage_account_name) \
>   --auth-mode login \
>   --query '[].name' -o tsv

You should see three containers (e.g., confidentai-stage-testcases, confidentai-stage-payloads, and confidentai-stage-chbackups).

Storage is used for file uploads. If storage connectivity fails, users won’t be able to upload datasets or export reports. The backend uses Azure Workload Identity to access Storage—verify the confident-storage-sa service account exists.

DNS configuration

The NGINX Ingress controller has an Azure Load Balancer IP. DNS records must point to it.

Get the Load Balancer IP

$ kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Example output:

20.85.123.45

Azure uses IP addresses, not hostnames. Unlike AWS ALBs which have hostnames, Azure Load Balancers use static IP addresses. You create DNS A records (not CNAME records) pointing your domains to this IP.

Create DNS records

Add A records for each hostname you configured in the ingress:

Record Type	Name	Value
A	`app.yourdomain.com`	(LB IP)
A	`api.yourdomain.com`	(LB IP)
A	`deepeval.yourdomain.com`	(LB IP)
A	`otel.yourdomain.com`	(LB IP)

A records vs. CNAME:

A record: Points a hostname to an IP address. Used for Azure Load Balancers.
CNAME: Points a hostname to another hostname. Cannot be used for Azure LB IPs.

Corporate DNS changes may require approval. If your DNS is managed by an internal team, submit change requests for all four records. Factor in approval time—this can delay verification by hours or days.

Verify DNS propagation

After adding records, verify they resolve correctly:

$ nslookup app.yourdomain.com
$ nslookup api.yourdomain.com

You should see the Load Balancer IP in the response. If you see “NXDOMAIN” or your old values, wait for DNS propagation (typically 5-30 minutes, up to 48 hours for some providers).

Application verification

Frontend access

Open your frontend URL in a browser:

https://app.yourdomain.com

What you should see:

HTTPS (padlock icon) — certificate is working
Confident AI login page — application is serving
Google OAuth button (if configured) — SSO is set up

Certificate errors?

“NET::ERR_CERT_COMMON_NAME_INVALID” — The certificate doesn’t cover this domain. Verify the ingress TLS hosts include the right domain names.
“NET::ERR_CERT_DATE_INVALID” — Certificate isn’t issued yet. Check cert-manager status: kubectl get certificate -n confident-ai
“Your connection is not private” — If using self-signed certs, this is expected. For Let’s Encrypt, check the ClusterIssuer is ready.

Backend health check

The backend exposes a health endpoint:

$ curl -s https://api.yourdomain.com/health

Expected response:

1 { "status": "ok", "version": "1.2.3" }

Connection refused or timeout?

DNS not propagated: Wait and retry
NSG blocking: Check the NSG allows inbound HTTPS
Ingress not ready: Check kubectl get ingress -n confident-ai for an address
Backend not running: Check pod status with kubectl get pods -n confident-ai

Evals health check

$ curl -s https://deepeval.yourdomain.com/health

Expected: {"status":"healthy"}

OTEL collector health

$ curl -s https://otel.yourdomain.com/health

Expected: {"status":"ok"}

Functional testing

Navigate to https://app.yourdomain.com
Click “Sign in with Google” (or your configured auth provider)
Complete authentication
Expected: Redirected to dashboard, user session created

OAuth errors?

“redirect_uri_mismatch” — The redirect URI in Google Console doesn’t match. It must be exactly https://api.yourdomain.com/api/auth/callback/google.
“access_denied” — User not authorized. Check if OAuth app restricts to certain domains.
Infinite redirect loop — confident_subdomain may be misconfigured. Must be root domain, not full subdomain.

Test 2: Create a project

From the dashboard, click “New Project”
Enter a project name
Click Create
Expected: Project created successfully, appears in list

This verifies database connectivity and basic write operations.

Test 3: API key generation

Go to Project Settings → API Keys
Click “Generate API Key”
Copy the generated key
Expected: API key displayed (save it—it won’t be shown again)

Test 4: SDK connectivity

From your local machine (or any machine that can reach the backend):

1 import deepeval
2 
3 deepeval.login(
4     api_key="<your-generated-api-key>",
5     confident_api_endpoint="https://api.yourdomain.com"
6 )
7 
8 print(deepeval.check_connection())

Expected: True or success message

SDK can’t connect?

“Connection refused” — Backend not reachable. Check DNS and network connectivity.
“401 Unauthorized” — API key invalid. Generate a new one.
“SSL certificate verify failed” — Certificate issue. Check the URL is using HTTPS and cert is valid.

If your machine can’t reach the backend directly (internal LB), run this test from within the same network (VPN) or from a pod inside the cluster.

Test 5: Run a simple evaluation

1 from deepeval import evaluate
2 from deepeval.test_case import LLMTestCase
3 from deepeval.metrics import AnswerRelevancyMetric
4 
5 test_case = LLMTestCase(
6     input="What is the capital of France?",
7     actual_output="Paris is the capital of France.",
8 )
9 
10 metric = AnswerRelevancyMetric()
11 evaluate([test_case], [metric])

Expected: Evaluation runs and results appear in the dashboard.

Evaluation fails with API errors?

“OpenAI API error” — openai_api_key not configured or invalid
“timeout” — Network can’t reach OpenAI. Check outbound connectivity to api.openai.com
“Rate limited” — OpenAI quota exceeded. Check your OpenAI usage limits.

Your cluster needs outbound HTTPS access to OpenAI (or your configured LLM provider).

Service health checks

Run comprehensive health checks on all services:

$ # All pods should be Running
$ kubectl get pods -n confident-ai
$ 
$ # No recent crashes
$ kubectl get events -n confident-ai --field-selector type=Warning
$ 
$ # Check each deployment is ready
$ kubectl rollout status deployment/confident-backend -n confident-ai
$ kubectl rollout status deployment/confident-frontend -n confident-ai
$ kubectl rollout status deployment/confident-evals -n confident-ai
$ kubectl rollout status deployment/confident-otel -n confident-ai

View logs for troubleshooting

$ # Backend logs
$ kubectl logs -f deployment/confident-backend -n confident-ai
$ 
$ # Frontend logs
$ kubectl logs -f deployment/confident-frontend -n confident-ai
$ 
$ # Evals logs
$ kubectl logs -f deployment/confident-evals -n confident-ai

Production readiness checklist

Before announcing the deployment is ready for users, verify:

Security

HTTPS working on all endpoints (padlock icon in browser)
TLS certificate valid and not expiring soon
NSG restricts access appropriately
PostgreSQL not publicly accessible (private DNS only)
Storage Account has no public access
Key Vault has network ACLs configured
terraform.tfvars not committed to version control

High availability

At least 2 worker nodes running
At least 2 replicas for backend service (scale if not)
PostgreSQL has zone-redundant HA enabled (if required)
Worker pool autoscaling configured for traffic spikes

Monitoring (recommended)

Azure Monitor configured for AKS
PostgreSQL metrics and alerts enabled
Container Insights enabled
Alerts configured for node health, pod restarts, errors

Operations

Team members have cluster access (see Cluster Access page)
Runbook documented for common issues
Backup strategy confirmed for PostgreSQL
Upgrade path understood for future releases

What to do if verification fails

Don’t panic. Most issues have straightforward fixes:

Identify the failing component: Use the checks above to isolate which part isn’t working
Check logs: kubectl logs for pod issues, Azure Portal for service-level issues
Review configuration: Typos in URLs, missing certificates, wrong secret names
Check network: NSGs, DNS propagation, VPN connectivity
Contact support: If stuck, reach out to your Confident AI representative with:
- What step failed
- Exact error messages
- Results of relevant kubectl get commands

Summary

You’ve completed the Confident AI deployment on Azure:

Prerequisites — Installed tools and gathered credentials
Configuration — Set up Terraform variables
Provisioning — Created Azure infrastructure
TLS Certificates — Configured cert-manager and ClusterIssuer
Cluster Access — Configured kubectl access
Kubernetes Deployment — Deployed application services
Verification — Tested everything works

Your deployment is now ready for users. Welcome to Confident AI!

Need help? Contact your Confident AI representative or email support@confident-ai.com with details about your deployment and any issues encountered.

$	# Check External Secrets synced to Kubernetes
$	kubectl get secret confident-externalsecret -n confident-ai
$
$	# Check Key Vault via CLI
$	az keyvault secret list --vault-name $(terraform output -raw key_vault_name) --query '[].name' -o tsv

$	az storage container list \
>	--account-name $(terraform output -raw storage_account_name) \
>	--auth-mode login \
>	--query '[].name' -o tsv

1	import deepeval
2
3	deepeval.login(
4	api_key="<your-generated-api-key>",
5	confident_api_endpoint="https://api.yourdomain.com"
6	)
7
8	print(deepeval.check_connection())

1	from deepeval import evaluate
2	from deepeval.test_case import LLMTestCase
3	from deepeval.metrics import AnswerRelevancyMetric
4
5	test_case = LLMTestCase(
6	input="What is the capital of France?",
7	actual_output="Paris is the capital of France.",
8	)
9
10	metric = AnswerRelevancyMetric()
11	evaluate([test_case], [metric])

$	# All pods should be Running
$	kubectl get pods -n confident-ai
$
$	# No recent crashes
$	kubectl get events -n confident-ai --field-selector type=Warning
$
$	# Check each deployment is ready
$	kubectl rollout status deployment/confident-backend -n confident-ai
$	kubectl rollout status deployment/confident-frontend -n confident-ai
$	kubectl rollout status deployment/confident-evals -n confident-ai
$	kubectl rollout status deployment/confident-otel -n confident-ai

$	# Backend logs
$	kubectl logs -f deployment/confident-backend -n confident-ai
$
$	# Frontend logs
$	kubectl logs -f deployment/confident-frontend -n confident-ai
$
$	# Evals logs
$	kubectl logs -f deployment/confident-evals -n confident-ai

Overview

Infrastructure verification

AKS cluster health

PostgreSQL connectivity

Key Vault secrets

Storage Account

DNS configuration

Get the Load Balancer IP

Create DNS records

Verify DNS propagation

Application verification

Frontend access

Backend health check

Evals health check

OTEL collector health

Functional testing

Test 1: User login

Test 2: Create a project

Test 3: API key generation

Test 4: SDK connectivity

Test 5: Run a simple evaluation

Service health checks

View logs for troubleshooting

Production readiness checklist

Security

High availability

Monitoring (recommended)

Operations

What to do if verification fails

Summary

Overview

Infrastructure verification

AKS cluster health

PostgreSQL connectivity

Key Vault secrets

Storage Account

DNS configuration

Get the Load Balancer IP

Create DNS records

Verify DNS propagation

Application verification

Frontend access

Backend health check

Evals health check

OTEL collector health

Functional testing

Test 1: User login

Test 2: Create a project

Test 3: API key generation

Test 4: SDK connectivity

Test 5: Run a simple evaluation

Service health checks

View logs for troubleshooting

Production readiness checklist

Security

High availability

Monitoring (recommended)

Operations

What to do if verification fails

Summary