For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Trust CenterStatusSupportGet a demoPlatform
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingChangelog
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingChangelog
  • Get Started
    • Introduction
    • Setup and Installation
  • LLM Evaluation
    • Introduction
    • Experiments
  • Metrics
    • Introduction
    • Metric Collections
    • Custom Metrics
      • G-Eval
      • Code-Evals
  • LLM Tracing
    • Introduction
    • Signals
    • Troubleshooting
  • Human-in-the-Loop
    • Introduction
    • Collect Feedback
  • Reporting & Analytics
    • Dashboards
    • Executive Insights
  • Red Teaming
    • Introduction
    • Quickstart
    • Frameworks & Policies
    • Risk Profiles
    • Red Team Using DeepTeam
  • Resources
    • Why Confident AI
    • Support
    • Data Handling
    • LLM Use Cases
LogoLogo
Trust CenterStatusSupportGet a demoPlatform
On this page
  • Overview
  • Available Custom Metrics
  • How It Works
  • Next Steps
Metrics

Custom Metrics

Create custom metrics for your specific use case
Was this page helpful?
Previous

G-Eval

Learn how to create a G-Eval metric for custom evaluation algorithms

Next
Built with

Overview

Custom metrics are one of the most important metrics for testing LLM apps as they allow you to evaluate on criteria specific to your use case. You can create and use custom metrics either:

  • Locally, to run evals on your machine before sending test results to Confident AI’s, best for code-driven evals
  • Remotely, to run evals on Confident AI directly, perfect for no-code evaluation workflows
Local Evals
  • Run evaluations locally using deepeval with full control over metrics
  • Support for custom metrics, DAG, and advanced evaluation algorithms

Suitable for: Python users, development, and pre-deployment workflows

Remote Evals
  • Run evaluations on Confident AI platform with pre-built metrics
  • Integrated with monitoring, datasets, and team collaboration features

Suitable for: Non-python users, online + offline evals for tracing in prod

Available Custom Metrics

There are two types of custom metrics you can create:

  • G-Eval: LLM-as-a-judge metrics defined using natural language criteria. G-Eval is the most common approach for custom metrics since it requires no coding and can evaluate nuanced, subjective criteria like tone, helpfulness, or domain-specific correctness.
  • Code-Evals: Programmatic metrics written in Python directly on Confident AI using the deepeval framework. Use code-based metrics when you need deterministic logic, external API calls, or complex computations that can’t be expressed in natural language.

Running custom metrics locally gives your code-level control over your metrics, but they are only limited to python users using deepeval and is not available for on/offline evals in production.

How It Works

Custom metrics follow a simple workflow:

  1. Create a metric — Define your custom metric either locally using deepeval or remotely via the Confident AI UI. For G-Eval, you’ll provide natural language criteria; for Code-Evals, you’ll write Python code directly on the platform.
  2. Add to a metric collection — Group your metric into a metric collection where you can configure settings like threshold (minimum passing score) and strictness (how harshly to penalize failures).
  3. Run evaluations — Execute your metrics either locally via deepeval or remotely through the Confident AI platform.
  4. View results — Analyze scores, reasoning, and pass/fail status in Confident AI’s dashboard.

Next Steps

Now that you know your options, it’s time to select your preferences for creating custom metrics:

G-Eval

Create LLM-as-a-judge metrics using natural language criteria. Best for evaluating nuanced, subjective qualities.

Code-Evals

Write Python code directly on Confident AI for deterministic logic or complex computations.