Quickstart

5 min quickstart guide for Confident AI's Evals API

Overview

Confident AI’s Evals API allows you to run online evaluations on test cases, traces, spans, and threads. This 5-minute quickstart will allow you to run your first evaluation by walking you through:

  • Create a metric collection
  • Use the /v1/evaluate endpoint to create a test run

Run Your First Eval

Here’s a step-by-step guide on how to run your first online evaluation using the Evals API.

1

Get your API key

Create a free account at https://app.confident-ai.com, and get your Project API Key.

Make sure you’re not copying your organization API key.

2

Create a metric collection

You can create a metric collection containing the metric you wish to run evals with using the POST /v1/metric-collections endpoint. Note that all metric collections must have a unique name within your project.

POST
/v1/metric-collections
1curl -X POST https://api.confident-ai.com/v1/metric-collections \
2 -H "CONFIDENT_API_KEY: <PROJECT-API-KEY>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "name": "Collection Name",
6 "multiTurn": false,
7 "metricSettings": [
8 {
9 "metric": {
10 "name": "Answer Relevancy"
11 },
12 "threshold": 0.8
13 }
14 ]
15}'
3

Create test run

To run an evaluation, provide the name of the metric collection a list of "llmTestCases" in your request body to run single-turn evaluations.

POST
/v1/evaluate
1curl -X POST https://api.confident-ai.com/v1/evaluate \
2 -H "CONFIDENT_API_KEY: <PROJECT-API-KEY>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "metricCollection": "Collection Name",
6 "llmTestCases": [
7 {
8 "input": "How tall is mount everest?",
9 "actualOutput": "No clue, pretty tall I guess?"
10 }
11 ]
12}'

🎉 Congratulations! You just successfully ran your first evaluation on Confident AI via the Evals API.

The /v1/evaluate API endpoint will create a test run on Confident AI and return the following response:

Response
1{
2 "success": true,
3 "data": {
4 "id": "TEST-RUN-ID"
5 },
6 "deprecated": false
7}
4

Verify test run on the UI

After running an eval using our Evals API, your test results will be automatically stored on the Confident AI platform in a comprehensive report format. You can also separate the test results using the TEST-RUN-ID from the API response.

Test Reports on Confident AI

Next Steps

Now that you’ve run your first online evaluation, explore these next steps to go deeper with Confident AI:

  • Custom Datasets — Create custom datasets using the datasets endpoint.
  • Prompt Templates — Iterate and version your LLM prompts directly through the prompts endpoint.
  • Human Annotations — Annotate your evaluations to enable human-in-the-loop feedback to guide metric tuning and reinforce quality with the annotation endpoint.