Metric Collection
Overview
A metric collection on Confident AI is a collection of metric and their respective settings. It is what allows you to run evaluations remotely. This can be for either:
- Evals in development, through the Evals APIs
- Evals for LLM tracing, through the means of online or offline evals
Metric collections are strictly used for remote evals, and are identified by an unique name, and does not require any code to manage.
- Run evaluations locally using
deepevalwith full control over metrics - Support for custom metrics, DAG, and advanced evaluation algorithms
Suitable for: Python users, development, and pre-deployment workflows
- Run evaluations on Confident AI platform with pre-built metrics
- Integrated with monitoring, datasets, and team collaboration features
Suitable for: Non-python users, online + offline evals for tracing in prod
Create a Metric Collection
You can create a single or multi-turn metric collection under Project > Metrics > Collections. All you need to do is provide it with a unique name, select the appropriate metrics, and edit their settings (if required).
You can use metric collections for any remote evals in Confident AI:
- Running single or multi-turn E2E testing via Evals API
- Running single or mult-turn online/offline evals during LLM tracing
Understanding Metric Collections
Metric collections and metrics are connected in-directly via metric settings, which specifies the specific threshold, strictness, etc. of each metric in different collections.
• Metric Collection: A group of metrics that you wish to evaluate together (either for a test run or online evaluation).
• Metric Settings: Configuration options for how a metric within a metric collection should be evaluated, including the threshold, strictness, and whether to include reasoning.
When you run remote evals by providing a metric collection name, Confident AI will fetch the metric and their settings related to said collection, before using all these configs to run evals.