Faithfulness | Confident AI Docs

Overview

The faithfulness metric is a single-turn RAG metric that uses LLM-as-a-judge to assess whether your generator’s answers rely solely on the retrieved context, without hallucinating or providing misinformation.

Required Parameters

These are the parameters you must supply in your test case to run evaluations for faithfulness metric:

input

stringRequired

The input query you supply to your RAG application.

actual_output

stringRequired

The final output your RAG application’s generator generates.

retrieval_context

list of stringRequired

The retrieved context your retriever outputs for a given input sorted by their rank.

How Is It Calculated?

The faithfulness metric first extracts individual claims from the actual output using an LLM, then uses the same LLM to check how many claims are supported by the retrieved context.

\text{Faithfulness} = \frac{\text{Number of Truthful Claims}}{\text{Total Number of Claims}}

A claim is considered truthful if it does not contradict any facts presented in the retrieval context.

The final score is the proportion of truthful claims found in the actual output.

Create Locally

You can create the FaithfulnessMetric in deepeval as follows:

1 from deepeval.metrics import FaithfulnessMetric
2 
3 metric = FaithfulnessMetric()

Here’s a list of parameters you can configure when creating a FaithfulnessMetric:

threshold

numberDefaults to 0.5

A float to represent the minimum passing threshold.

model

string | ObjectDefaults to gpt-4.1

A string specifying which of OpenAI’s GPT models to use OR any custom LLM model of type DeepEvalBaseLLM.

include_reason

booleanDefaults to true

A boolean to enable the inclusion a reason for its evaluation score.

async_mode

booleanDefaults to true

A boolean to enable concurrent execution within the measure() method.

strict_mode

booleanDefaults to false

A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.

verbose_mode

booleanDefaults to false

A boolean to print the intermediate steps used to calculate the metric score.

evaluation_template

FaithfulnessTemplateDefaults to deepeval's template

An instance of FaithfulnessTemplate object, which allows you to override the default prompts used to compute the FaithfulnessMetric score.

This can be used for both single-turn E2E and component-level testing.

Create Remotely

For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the faithfulness metric by adding it to a single-turn metric collection. This will allow you to use faithfulness metric for:

Single-turn E2E testing
Single-turn component-level testing
Online and offline evals for traces and spans