For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Trust CenterStatusSupportGet a demoPlatform
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingGuidesChangelog
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingGuidesChangelog
  • Get Started
    • Introduction
    • Quickstart
    • Authentication
    • Data Models
    • API Conventions
  • Data Models
  • Evals
      • POSTRun LLM Evals
      • POSTEvaluate Span
      • POSTEvaluate Trace
      • POSTEvaluate Thread
  • Legacy
LogoLogo
Trust CenterStatusSupportGet a demoPlatform
EvalsEvaluate

Run LLM Evals

POST
https://api.confident-ai.com/v1/evaluate
POST
/v1/evaluate
$curl -X POST https://api.confident-ai.com/v1/evaluate \
> -H "CONFIDENT_API_KEY: <PROJECT-API-KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "metricCollection": "Collection Name",
> "llmTestCases": [
> {
> "input": "How tall is mount everest?",
> "actualOutput": "No clue, pretty tall I guess?"
> }
> ]
>}'
200Single-Turn
1{
2 "success": true,
3 "data": {
4 "id": "TEST-RUN-ID"
5 },
6 "deprecated": false
7}

Run online evals for your test cases using the metrics in metricCollection.

Was this page helpful?
Previous

Evaluate Span

Next
Built with

Headers

CONFIDENT_API_KEYstringRequired
The API key of your Confident AI project.

Request

metricCollectionstringRequired
The name of the metric collection you wish to use for evaluation.
llmTestCaseslist of objectsOptional

This is a list of single-turn test cases to evaluate. If you are evaluating multi-turn test cases, this should be null.

conversationalTestCaseslist of objectsOptional

This is a list of multi-turn test cases to evaluate. If you are evaluating single-turn test cases, this should be null.

hyperparametersmap from strings to anyOptional
This is any hyperparameters like model or prompt you wish to associate with the test run.
identifierstringOptional
A unique identifier for the test run.

Response

This endpoint returns an object.
successboolean
This is true if the test cases were successfully evaluated.
dataobject
deprecatedboolean
This is true if this endpoint is deprecated.