Quickstart
Overview
Confident AI’s Evals API allows you to run online evaluations on test cases, traces, spans, and threads. This 5-minute quickstart will allow you to run your first evaluation by walking you through:
- Create a metric collection
- Use the
/v1/evaluateendpoint to create a test run
Run Your First Eval
Here’s a step-by-step guide on how to run your first online evaluation using the Evals API.
Get your API key
Create a free account at https://app.confident-ai.com, and get your Project API Key.
Make sure you’re not copying your organization API key.
Create a metric collection
You can create a metric collection containing the metric you wish to run evals with using the POST /v1/metric-collections endpoint. Note that all metric collections must have a unique name within your project.
Create test run
To run an evaluation, provide the name of the metric collection a list of "llmTestCases" in your request body to run single-turn evaluations.
🎉 Congratulations! You just successfully ran your first evaluation on Confident AI via the Evals API.
The /v1/evaluate API endpoint will create a test run on Confident AI and return the following response:
Verify test run on the UI
After running an eval using our Evals API, your test results will be automatically stored on the Confident AI platform in a comprehensive report format. You can also separate the test results using the TEST-RUN-ID from the API response.
Next Steps
Now that you’ve run your first online evaluation, explore these next steps to go deeper with Confident AI:
- Custom Datasets — Create custom datasets using the datasets endpoint.
- Prompt Templates — Iterate and version your LLM prompts directly through the prompts endpoint.
- Human Annotations — Annotate your evaluations to enable human-in-the-loop feedback to guide metric tuning and reinforce quality with the annotation endpoint.