Hallucination

Halucination is a single-turn metric to determine if your LLM is hallucinating false information.

Overview

The hallucination metric is a single-turn safety metric that uses LLM-as-a-judge to assess whether your LLM’s output is truthful and free from false or hallucinated information.

The hallucination metric needs an actual output and context in the test case to perform evaluations.

Required Parameters

These are the parameters you must supply in your test case to run evaluations for hallucination metric:

input
stringRequired

The input you supplied to your LLM application.

actual_output
stringRequired

The final output your LLM application generates.

context
list of stringRequired

A list of strings containing context that can be used to answer the input. Usually strings of documents.

How Is It Calculated?

The hallucination metric uses an LLM to identify contradictions between the actual output and the provided context, treating the context as ground truth.


Hallucination=Number of Contradicted ContextsNumber of Contexts\text{Hallucination} = \frac{\text{Number of Contradicted Contexts}}{\text{Number of Contexts}}

The final score is the proportion of contradicted contexts found in the actual output.

Create Locally

You can create the HallucinationMetric in deepeval as follows:

1from deepeval.metrics import HallucinationMetric
2
3metric = HallucinationMetric()

Here’s a list of parameters you can configure when creating a HallucinationMetric:

threshold
numberDefaults to 0.5

A float representing the maximum passing threshold.

Unlike other metrics, the threshold for the HallucinationMetric is a maximum instead of a minimum threshold.

model
string | ObjectDefaults to gpt-4.1

A string specifying which of OpenAI’s GPT models to use OR any custom LLM model of type DeepEvalBaseLLM.

include_reason
booleanDefaults to true

A boolean to enable the inclusion a reason for its evaluation score.

async_mode
booleanDefaults to true

A boolean to enable concurrent execution within the measure() method.

strict_mode
booleanDefaults to false

A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.

verbose_mode
booleanDefaults to false

A boolean to print the intermediate steps used to calculate the metric score.

This can be used for both single-turn E2E and component-level testing.

Create Remotely

For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the hallucination metric by adding it to a single-turn metric collection. This will allow you to use hallucination metric for:

  • Single-turn E2E testing
  • Single-turn component-level testing
  • Online and offline evals for traces and spans