Metric Collections | Confident AI Docs

Overview

A metric collection on Confident AI is a collection of metric and their respective settings. It is what allows you to run evaluations remotely. This can be for either:

Evals in development, through the Evals APIs
Evals for LLM tracing, through the means of online or offline evals

Metric collections are strictly used for remote evals, and are identified by an unique name, and does not require any code to manage.

Both single and multi-turn metric collections are supported.

Local Evals

Run evaluations locally using deepeval with full control over metrics
Support for custom metrics, DAG, and advanced evaluation algorithms

Suitable for: Python users, development, and pre-deployment workflows

Remote Evals

Run evaluations on Confident AI platform with pre-built metrics
Integrated with monitoring, datasets, and team collaboration features

Suitable for: Non-python users, online + offline evals for tracing in prod

Why Metric Collection?

Metric collections solve key challenges when running evaluations at scale:

Reusable configurations — Define your evaluation setup once and reuse it across test runs, experiments, and production monitoring
Customizable settings per context — The same metric can have different thresholds or strictness levels in different collections (e.g., stricter for production, lenient for development)
No-code management — Create and update collections entirely through the UI without touching any code
Consistent evaluations — Ensure all team members and automated pipelines use the same evaluation criteria

Create a Metric Collection

Via UI

Via Code

You can create a single or multi-turn metric collection under Project > Metrics > Collections. All you need to do is provide it with a unique name, select the appropriate metrics, and edit their settings (if required).

Metric Collection for Remote Evals

You can use metric collections for any remote evals in Confident AI:

Running single or multi-turn E2E testing via Evals API
Running single or mult-turn online/offline evals during LLM tracing

Understanding Metric Collections

Metric collections and metrics are connected in-directly via metric settings, which specifies the specific threshold, strictness, etc. of each metric in different collections.

• Metric Collection: A group of metrics that you wish to evaluate together (either for a test run or online evaluation).

• Metric Settings: Configuration options for how a metric within a metric collection should be evaluated, including the threshold, strictness, and whether to include reasoning.

When you run remote evals by providing a metric collection name, Confident AI will fetch the metric and their settings related to said collection, before using all these configs to run evals.