Introduction

Welcome to Confident AI's Evals API reference.

What is Evals API?

The RESTFUL Evals API enables organizations to offload evaluations, ingest LLM traces, manage datasets, prompt versions, and more on Confident AI. It allows you to:

  • Run metrics remotely on Confident AI, without having to manage the infrastructure overhead
  • Keep a centralized admin dashbaord for all evals, traces, datasets, prompts etc. ingested
  • Manage user annotations, and manipulate LLM traces
  • And most important, build your custom LLMOps pipeline

All evaluations ran using the Evals API is powered by DeepEval, the open-source LLM Evaluation framework.

Evaluation metrics via the Evals API are 100% powered by ⭐ DeepEval 💯

DeepEval is one of the most widely adopted LLM evaluation framework in the world, with over 10k stars and 20 million daily evaluations.

Star History
Chart

⭐ DeepEval Star Growth ⭐

Key Capabilities

The Evals API offers the same functionality but more low-level control over clicking around in the UI:

  • Comprehensive single-turn, multi-turn LLM testing
  • Experiment with different versions of prompts and models
  • Detect unexpected breaking changes through evals
  • LLM tracing to debug and monitor in production
  • Track product analytics and user stats
  • Include human-in-the-loop to notice what needs to be worked on

Get Started

Start building your own LLMops pipeline with Evals API.

Main Endpoints

Access a full suite of endpoints to manage evaluations, datasets, prompts, traces, and more.

FAQs

The Evals API provides more low-level control over the DeepEval client and provide benefits that DeepEval alone doesn’t offer:

Managed Infrastructure: Serverless evaluations on our managed servers, error handling for metric failures and retries, cost management and billing optimization, automatic scaling based on evaluation volume.

Platform Dashboard: Visual results for each customer dataset, historical tracking and trends, team collaboration features, custom analytics dashboards.

The Evals API and platform serve different use cases in your LLM application development workflow:

Platform (Dashboard): Use when your engineering teams need to improve an LLM application. It provides visual test case creation, interactive evaluation results, team collaboration features, and built-in dashboards.

Evals API: Use when building an LLM application that needs to automate evaluations for different customers, run evaluations programmatically, build custom dashboards, integrate into existing workflows, or scale across multiple customer environments.

Both approaches use the same underlying evaluation engine, so you can start with the platform for development and use the API for production automation.

  1. Organizations that need to scale evaluations across multiple customers or environments while maintaining visibility into results.
  2. Users that aren’t working with Python or Typescript. If users are working with either Python or Typescript, using DeepEval as your client library is highly recommended.