The RESTFUL Evals API enables organizations to offload evaluations, ingest LLM traces, manage datasets, prompt versions, and more on Confident AI. It allows you to:
All evaluations ran using the Evals API is powered by DeepEval, the open-source LLM Evaluation framework.
The Evals API offers the same functionality but more low-level control over clicking around in the UI:
Start building your own LLMops pipeline with Evals API.
Run your first remote LLM evaluation.
Learn how authentication works in Evals API.
Understand core data models and how they connect.
Understand conventions such as response formats and status codes.
Access a full suite of endpoints to manage evaluations, datasets, prompts, traces, and more.
The Evals API provides more low-level control over the DeepEval client and provide benefits that DeepEval alone doesn’t offer:
Managed Infrastructure: Serverless evaluations on our managed servers, error handling for metric failures and retries, cost management and billing optimization, automatic scaling based on evaluation volume.
Platform Dashboard: Visual results for each customer dataset, historical tracking and trends, team collaboration features, custom analytics dashboards.
The Evals API and platform serve different use cases in your LLM application development workflow:
Platform (Dashboard): Use when your engineering teams need to improve an LLM application. It provides visual test case creation, interactive evaluation results, team collaboration features, and built-in dashboards.
Evals API: Use when building an LLM application that needs to automate evaluations for different customers, run evaluations programmatically, build custom dashboards, integrate into existing workflows, or scale across multiple customer environments.
Both approaches use the same underlying evaluation engine, so you can start with the platform for development and use the API for production automation.