Evaluation Models
Configure and manage the evaluation models used for running LLM-as-a-judge metrics in your project.
Configure and manage the evaluation models used for running LLM-as-a-judge metrics in your project.
By default, Confident AI provides evaluation models for you to use for all evals run on the platform. You can however customize the evaluation model used to your liking.
To configure your evaluation model:
You can only select a provider if you have credentials configured for it. See the sections below to configure your providers.
Alternatively, toggle Inherit from Organization to use the model credentials configured at the organization level instead of configuring them per-project.
OpenAI gpt-5 model family. OpenAI requires the org whose key handles the call to be verified before it will serve gpt-5 (and certain other gated SKUs). If you select gpt-5 and the call returns a verification error:
gpt-5.4 or gpt-5.4-mini instead, or contact support.Other Confident features (Classifiers, Error Analysis, Test Run summarizers, Auto-Annotation, Executive Insights) all use gpt-5.4 / gpt-5.4-mini by default and aren’t affected unless you explicitly pick gpt-5.
There are three categories of providers you can configure. To set up any provider, click the three-dot menu (⋮) on the right side of the row and enter your API key or configuration details.
Model Providers — Provide your API key to run evaluations:
Cloud Providers — Run evaluations using models hosted on your cloud infrastructure:
LLM Gateways — Connect a gateway to manage tag-based routing credentials: