Quickstart

5 min quickstart guide for Confident AI's Evals API

Overview

Confident AI’s Evals API allows you to run online evaluations on test cases, traces, spans, and threads. This 5-minute quickstart will allow you to run your first evaluation by walking you through:

  • Create a metric collection
  • Use the /v1/evaluate endpoint to create a test run

Run Your First Eval

Here’s a step-by-step guide on how to run your first online evaluation using the Evals API.

1

Get your API key

Create a free account at https://app.confident-ai.com, and get your Project API Key.

Make sure you’re not copying your organization API key.

2

Create a metric collection

You can create a metric collection containing the metric you wish to run evals with using the POST /v1/metric-collections endpoint. Note that all metric collections must have a unique name within your project.

POST
/v1/metric-collections
1import requests
2
3url = "https://api.confident-ai.com/v1/metric-collections"
4
5payload = {
6 "name": "Collection Name",
7 "multiTurn": False,
8 "metricSettings": [
9 {
10 "metric": { "name": "Answer Relevancy" },
11 "threshold": 0.8
12 }
13 ]
14}
15headers = {
16 "CONFIDENT_API_KEY": "<PROJECT-API-KEY>",
17 "Content-Type": "application/json"
18}
19
20response = requests.post(url, json=payload, headers=headers)
21
22print(response.json())
3

Create test run

To run an evaluation, provide the name of the metric collection a list of "llmTestCases" in your request body to run single-turn evaluations.

POST
/v1/evaluate
1import requests
2
3url = "https://api.confident-ai.com/v1/evaluate"
4
5payload = {
6 "metricCollection": "Collection Name",
7 "llmTestCases": [
8 {
9 "input": "How tall is mount everest?",
10 "actualOutput": "No clue, pretty tall I guess?"
11 }
12 ]
13}
14headers = {
15 "CONFIDENT_API_KEY": "<PROJECT-API-KEY>",
16 "Content-Type": "application/json"
17}
18
19response = requests.post(url, json=payload, headers=headers)
20
21print(response.json())

🎉 Congratulations! You just successfully ran your first evaluation on Confident AI via the Evals API.

The /v1/evaluate API endpoint will create a test run on Confident AI and return the following response:

Response
1{
2 "success": true,
3 "data": {
4 "id": "TEST-RUN-ID"
5 },
6 "deprecated": false
7}
4

Verify test run on the UI

After running an eval using our Evals API, your test results will be automatically stored on the Confident AI platform in a comprehensive report format. You can also separate the test results using the TEST-RUN-ID from the API response.

Test Reports on Confident AI

Next Steps

Now that you’ve run your first online evaluation, explore these next steps to go deeper with Confident AI:

  • Custom Datasets — Create custom datasets using the datasets endpoint.
  • Prompt Templates — Iterate and version your LLM prompts directly through the prompts endpoint.
  • Human Annotations — Annotate your evaluations to enable human-in-the-loop feedback to guide metric tuning and reinforce quality with the annotation endpoint.