Custom metrics are one of the most important metrics for testing LLM apps as they allow you to evaluate on criteria specific to your use case. You can create and use custom metrics either:
deepeval with full control over metricsSuitable for: Python users, development, and pre-deployment workflows
Suitable for: Non-python users, online + offline evals for tracing in prod
There are two types of custom metrics you can create:
deepeval framework. Use code-based metrics when you need deterministic logic, external API calls, or complex computations that can’t be expressed in natural language.Running custom metrics locally gives your code-level control over your metrics, but they are only limited to python users using deepeval and is not available for on/offline evals in production.
Custom metrics follow a simple workflow:
deepeval or remotely via the Confident AI UI. For G-Eval, you’ll provide natural language criteria; for Code-Evals, you’ll write Python code directly on the platform.deepeval or remotely through the Confident AI platform.Now that you know your options, it’s time to select your preferences for creating custom metrics: