Verification
Overview
You’ve provisioned infrastructure and deployed the application. This final step verifies everything works correctly. You will:
- Confirm all infrastructure components (EKS, RDS, S3, secrets) are healthy
- Configure DNS to point to your load balancer
- Test application access via the frontend and backend URLs
- Run functional tests (user login, project creation, SDK connectivity)
- Check health endpoints for each service
- Complete a production readiness checklist
After this step, your deployment is verified and ready for users.
Infrastructure verification
Before testing the application, verify all infrastructure components are healthy.
EKS cluster health
All nodes should show Ready:
Nodes stuck in NotReady? Check for issues:
Look at the “Conditions” section for clues. Common causes: network plugin issues, insufficient resources, or failed health checks.
RDS connectivity
Verify the database is accessible from the cluster by checking backend logs:
You should see successful connection messages, not connection refused errors.
Check RDS status in AWS Console:
- Go to RDS → Databases
- Find your instance (named like
confidentai-stage-rds) - Status should be “Available”
Secrets Manager
Verify the secret exists and External Secrets can read it:
S3 bucket
Verify the bucket exists:
You should see your bucket (e.g., confidentai-stage-app-bucket).
S3 is used for file uploads. If S3 connectivity fails, users won’t be able
to upload datasets or export reports. The backend uses IRSA (IAM Roles for
Service Accounts) to access S3—verify the confident-s3-sa service account
exists.
DNS configuration
The ALB (Application Load Balancer) has been created, but DNS records must point to it.
Get the ALB hostname
Example output:
Why a long hostname? AWS ALBs have automatically generated hostnames. You create DNS records (CNAME or ALIAS) that point your friendly domains to this ALB hostname.
Create DNS records
Add DNS records for each hostname you configured in the ingress:
CNAME vs. ALIAS:
- CNAME: Works with most DNS providers. Points one hostname to another.
- ALIAS (Route 53 only): Works at the zone apex (e.g.,
yourdomain.comwithout a subdomain). Recommended if using Route 53.
If your DNS provider only supports CNAME, you must use subdomains (e.g., app.yourdomain.com), not the root domain.
Corporate DNS changes may require approval. If your DNS is managed by an internal team, submit change requests for all four records. Factor in approval time—this can delay verification by hours or days.
Verify DNS propagation
After adding records, verify they resolve correctly:
You should see the ALB hostname in the response. If you see “NXDOMAIN” or your old values, wait for DNS propagation (typically 5-30 minutes, up to 48 hours for some providers).
Use a global DNS checker:
- Go to dnschecker.org
- Enter your frontend domain (e.g.,
app.yourdomain.com) - Verify servers worldwide resolve to your ALB
Application verification
Frontend access
Open your frontend URL in a browser:
What you should see:
- HTTPS (padlock icon) — certificate is working
- Confident AI login page — application is serving
- Google OAuth button (if configured) — SSO is set up
Certificate errors?
- “NET::ERR_CERT_COMMON_NAME_INVALID” — The certificate doesn’t cover this domain. Verify the ACM certificate includes the right domain names.
- “NET::ERR_CERT_DATE_INVALID” — Certificate isn’t issued yet. Go back to SSL Certificates step.
- “Your connection is not private” — Mixed causes. Check certificate status in ACM console.
Backend health check
The backend exposes a health endpoint:
Expected response:
Connection refused or timeout?
- DNS not propagated: Wait and retry
- Security group blocking: Check ALB security group allows inbound HTTPS
- ALB not ready: ALB provisioning takes a few minutes after ingress creation
- Backend not running: Check pod status with
kubectl get pods -n confident-ai
Evals health check
Expected: {"status":"healthy"}
OTEL collector health
Expected: {"status":"ok"}
Functional testing
Test 1: User login
- Navigate to
https://app.yourdomain.com - Click “Sign in with Google” (or your configured auth provider)
- Complete authentication
- Expected: Redirected to dashboard, user session created
OAuth errors?
- “redirect_uri_mismatch” — The redirect URI in Google Console doesn’t match. It must be exactly
https://api.yourdomain.com/api/auth/callback/google. - “access_denied” — User not authorized. Check if OAuth app restricts to certain domains.
- Infinite redirect loop —
confident_subdomainmay be misconfigured. Must be root domain, not full subdomain.
Test 2: Create a project
- From the dashboard, click “New Project”
- Enter a project name
- Click Create
- Expected: Project created successfully, appears in list
This verifies database connectivity and basic write operations.
Test 3: API key generation
- Go to Project Settings → API Keys
- Click “Generate API Key”
- Copy the generated key
- Expected: API key displayed (save it—it won’t be shown again)
Test 4: SDK connectivity
From your local machine (or any machine that can reach the backend):
Expected: True or success message
SDK can’t connect?
- “Connection refused” — Backend not reachable. Check DNS and network connectivity.
- “401 Unauthorized” — API key invalid. Generate a new one.
- “SSL certificate verify failed” — Certificate issue. Check the URL is using HTTPS and cert is valid.
If your machine can’t reach the backend directly (internal ALB), run this test from within the same network (VPN) or from a pod inside the cluster.
Test 5: Run a simple evaluation
Expected: Evaluation runs and results appear in the dashboard.
Evaluation fails with API errors?
- “OpenAI API error” —
openai_api_keynot configured or invalid - “timeout” — Network can’t reach OpenAI. Check outbound connectivity to
api.openai.com - “Rate limited” — OpenAI quota exceeded. Check your OpenAI usage limits.
Your cluster needs outbound HTTPS access to OpenAI (or your configured LLM provider).
Service health checks
Run comprehensive health checks on all services:
View logs for troubleshooting
Log levels: By default, logs show INFO level and above. If you need more detail for debugging, you can increase verbosity through environment variables (contact Confident AI support for guidance).
Production readiness checklist
Before announcing the deployment is ready for users, verify:
Security
- HTTPS working on all endpoints (padlock icon in browser)
- ACM certificate valid and not expiring soon
- ALB security groups restrict access appropriately (internal vs. internet-facing)
- RDS not publicly accessible (should be in private subnets)
- S3 bucket has no public access
- Secrets Manager secret has automatic rotation enabled
-
terraform.tfvarsnot committed to version control
High availability
- At least 2 nodes running in different availability zones
- At least 2 replicas for backend service (scale if not)
- RDS has multi-AZ enabled (if required)
- Node group autoscaling configured for traffic spikes
Monitoring (recommended)
- CloudWatch logs configured for EKS
- RDS performance insights enabled
- ALB access logs enabled
- Alerts configured for node health, pod restarts, errors
Operations
- Team members have cluster access (see Cluster Access page)
- Runbook documented for common issues
- Backup strategy confirmed for RDS
- Upgrade path understood for future releases
What to do if verification fails
Don’t panic. Most issues have straightforward fixes:
- Identify the failing component: Use the checks above to isolate which part isn’t working
- Check logs:
kubectl logsfor pod issues, CloudWatch for AWS services - Review configuration: Typos in URLs, missing certificates, wrong secret names
- Check network: Security groups, DNS propagation, VPN connectivity
- Contact support: If stuck, reach out to your Confident AI representative with:
- What step failed
- Exact error messages
- Results of relevant
kubectl getcommands
Summary
You’ve completed the Confident AI deployment on AWS:
- Prerequisites — Installed tools and gathered credentials
- Configuration — Set up Terraform variables
- Provisioning — Created AWS infrastructure
- SSL Certificates — Validated domain ownership
- Cluster Access — Configured kubectl access
- Kubernetes Deployment — Deployed application services
- Verification — Tested everything works
Your deployment is now ready for users. Welcome to Confident AI!
Need help? Contact your Confident AI representative or email [email protected] with details about your deployment and any issues encountered.