Cluster Access
Overview
With infrastructure provisioned, you now need to configure access to the AKS cluster. This step covers:
- Updating your kubeconfig to authenticate with AKS
- Verifying cluster connectivity and node status
- Confirming Terraform-deployed resources (Helm releases, service accounts, External Secrets)
- Understanding private cluster access options (VPN, Azure Bastion, public API)
- Granting access to additional team members
After this step, you will have working kubectl access and can verify all infrastructure components are healthy.
How AKS authentication works
AKS uses Azure Active Directory for authentication. When you run kubectl, it:
- Uses your Azure credentials to get a token
- Sends the token to the AKS API server
- AKS verifies the token against Azure AD
- If authorized, your command executes
This is why you need:
- Working Azure credentials (configured earlier)
- Your identity to be authorized in AKS
- Network connectivity to the AKS API endpoint
Terraform automatically grants you access because it creates the cluster using your credentials. The cluster creator is automatically an admin. Other team members need to be added separately (covered below).
Configure kubectl
Update your kubeconfig file with the AKS cluster credentials:
This command:
- Retrieves cluster connection information from AKS
- Adds a new context to your
~/.kube/configfile - Configures token generation using your Azure credentials
Expected output:
“Could not connect to the endpoint URL” error?
This usually means:
- Wrong resource group: Ensure the resource group name matches your deployment
- AKS not ready: The cluster may still be provisioning—wait a few minutes
- Network issues: Your network may block HTTPS to Azure APIs
Verify your configuration matches your Terraform outputs.
Verify cluster access
Test that you can communicate with the cluster:
Expected output:
You should see 2 system nodes plus your worker nodes (depending on confident_node_group_desired_size) in Ready status.
Timeout or connection refused?
This typically means the AKS API is not accessible from your network:
If confident_public_aks = false (default): The AKS API is only accessible from within the VNet. You need VPN access or VNet peering to your corporate network. See “Private cluster access” below.
If confident_public_aks = true: The API should be publicly accessible. Check your NSG rules and network connectivity.
Check system pods
Verify core Kubernetes components are running:
You should see pods for:
- coredns — DNS resolution within the cluster
- kube-proxy — Network routing
- azure-cni — Azure CNI networking
All pods should be Running with all containers ready.
Verify Terraform-deployed resources
Terraform deployed several Kubernetes resources. Let’s verify they’re working correctly.
Helm releases
Check that all Helm charts installed successfully:
Helm release shows “failed” or “pending-install”?
This sometimes happens when AKS wasn’t fully ready. Usually fixable by re-running:
Terraform will retry the failed Helm installations.
Confident AI namespace
Verify the namespace exists:
Service accounts
Check that the required service accounts are created:
Expected service accounts:
Why service accounts? Service accounts enable Azure Workload Identity,
which gives pods fine-grained Azure permissions. Instead of giving the whole
cluster access to Storage, only pods using confident-storage-sa can access
the blob containers. This follows the principle of least privilege.
External Secrets
External Secrets Operator syncs credentials from Azure Key Vault into Kubernetes secrets. Verify it’s working:
Expected:
Check the ExternalSecret:
Expected status: SecretSynced
ExternalSecret shows “SecretSyncedError”?
This means it couldn’t read from Key Vault. Common causes:
- Permissions: The
external-secrets-samanaged identity may not have Key Vault Secrets User role - Key Vault network ACLs: The Key Vault may be blocking access from the cluster
- Secret name mismatch: The ExternalSecret is looking for secrets that don’t exist in Key Vault
Check the error details:
Private cluster access
By default (confident_public_aks = false), the AKS API server is only accessible from within the VNet. This is a security best practice—it prevents unauthorized access from the internet.
To access a private cluster, you need network connectivity to the VNet.
Option A: VPN to your corporate network (recommended)
If your organization has VPN connectivity to Azure (via ExpressRoute, Site-to-Site VPN, or Virtual WAN):
- Connect to your corporate VPN
- Ensure the VPN routes include the Confident AI VNet address range
- Run kubectl commands normally
This is the recommended approach for production because it uses your existing network security infrastructure.
VPN routing must include the AKS VNet. If you configured a custom address
space (e.g., 10.0.0.0/16) in Prerequisites, ensure your VPN routes include
it. Work with your network team to add the route if needed.
Option B: Azure Bastion / Jump box
If you don’t have existing VNet connectivity, you can use an Azure VM within the VNet as a jump box:
- Create a VM in the Confident AI VNet
- SSH into the VM
- Install kubectl and az CLI on the VM
- Run kubectl commands from the VM
The Terraform code includes a commented-out bastion configuration in
bastion.tf that you can enable as a starting point.
Option C: Enable public API (not recommended for production)
If you’re just testing, you can enable public API access by setting confident_public_aks = true in your tfvars and re-running Terraform. This makes the AKS API accessible from the internet.
Public AKS API is a security risk. While authenticated by Azure AD, a publicly accessible API endpoint increases your attack surface. Only use this for temporary testing, never for production.
Grant access to team members
The person who ran Terraform is automatically an AKS admin. To grant access to other team members:
Using Azure AD groups (recommended)
Add Azure AD group object IDs to your tfvars:
Then re-run terraform apply. This grants cluster admin access to all members of that Azure AD group.
Using Azure CLI
For individual users:
Role assignments require the identity to exist in Azure AD. If you get errors, verify the user or group object ID is correct and exists in your tenant.
ArgoCD access
ArgoCD is deployed for GitOps-based deployments. You can access it once your network has connectivity to the cluster:
ArgoCD runs inside the cluster behind an internal Azure Load Balancer, so it’s only accessible via the internal network. You’ll need VPN connectivity to access the dashboard.
Troubleshooting
”You must be logged in to the server (Unauthorized)”
Your Azure identity isn’t authorized to access the cluster:
- Verify your credentials:
az account show - Check you’re using the same identity that ran Terraform
- If using a different identity, have an admin add you (see above)
“Unable to connect to the server: dial tcp: i/o timeout”
You have no network path to the AKS API:
- For private clusters, ensure you’re connected to VPN
- Verify the VPN routes include the VNet address range
- Check no firewall is blocking HTTPS (port 443) to Azure
Nodes show “NotReady”
Nodes take a few minutes to fully initialize. Wait 2-3 minutes after the cluster is created. If they stay NotReady:
Look at the “Conditions” section for clues. Common causes:
- Azure CNI not configured correctly
- Node can’t reach the AKS API
- Node VM has insufficient resources
Next steps
With cluster access configured, proceed to Kubernetes Deployment to deploy the Confident AI application services.