For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.

Trust Center Status Support Get a demo Platform

Documentation Evals API Reference Integrations & OTEL Platform Settings Self-Hosting Changelog

Documentation Evals API Reference Integrations & OTEL Platform Settings Self-Hosting Changelog

Get Started
Metrics
Metric Collections
Datasets
Evaluation
Tracing
Threads
- GETList Threads
- GETRetrieve Thread
Prompt
Metric Data
- GETList Metrics Data
Annotations
Annotation Queues
Projects

Trust Center Status Support Get a demo Platform

Run LLM Evals

POST

https://api.confident-ai.com/v1/evaluate

POST

/v1/evaluate

$ curl -X POST https://api.confident-ai.com/v1/evaluate \
>      -H "CONFIDENT_API_KEY: <PROJECT-API-KEY>" \
>      -H "Content-Type: application/json" \
>      -d '{
>   "metricCollection": "Collection Name",
>   "llmTestCases": [
>     {
>       "input": "How tall is mount everest?",
>       "actualOutput": "No clue, pretty tall I guess?"
>     }
>   ]
> }'

200Single-Turn

1 {
2   "success": true,
3   "data": {
4     "id": "TEST-RUN-ID"
5   },
6   "deprecated": false
7 }

Run online evals for your test cases using the metrics in metricCollection.

Was this page helpful?

Simulate Conversation

Headers

CONFIDENT_API_KEYstringRequired

The API key of your Confident AI project.

Request

metricCollectionstringRequired

The name of the metric collection you wish to use for evaluation.

llmTestCaseslist of objectsOptional

This is a list of single-turn test cases to evaluate. If you are evaluating multi-turn test cases, this should be null.

conversationalTestCaseslist of objectsOptional

This is a list of multi-turn test cases to evaluate. If you are evaluating single-turn test cases, this should be null.

hyperparametersmap from strings to anyOptional

This is any hyperparameters like model or prompt you wish to associate with the test run.

identifierstringOptional

A unique identifier for the test run.

Response

This endpoint returns an object.

successboolean

This is true if the test cases were successfully evaluated.

dataobject

deprecatedboolean

This is true if this endpoint is deprecated.