Answer Relevancy

Answer relevancy is a single-turn metric to evaluate RAG generators

Overview

The answer relevancy metric uses LLM-as-a-judge to assess whether your RAG generator’s output is relevant to the given input. It is a single-turn metric designed specifically for evaluating RAG QA specifically, and not general RAG.

The input of a test case should not contain the entire prompt, but just the query when using the answer relevancy metric.

Required Parameters

These are the parameters you must supply in your test case to run evaluations for answer relevancy metric:

input
stringRequired

The input query you supply to your RAG application.

actual_output
stringRequired

The final output your RAG application’s generator generates.

How Is It Calculated?

The answer relevancy metric first breaks down the actual output of a test case into distinct statements, then calculates the proportion of those statements that are relevant to the given input.


Answer Relevanacy=Number of Relevant StatementsTotal Number of Statements\text{Answer Relevanacy} = \frac{\text{Number of Relevant Statements}}{\text{Total Number of Statements}}

The final score is the proportion of relevant statements found in the actual output.

Create Locally

You can create the AnswerRelevancyMetric in deepeval as follows:

1from deepeval.metrics import AnswerRelevancyMetric
2
3metric = AnswerRelevancyMetric()

Here’s a list of parameters you can configure when creating a AnswerRelevancyMetric:

threshold
numberDefaults to 0.5

A float to represent the minimum passing threshold.

model
string | ObjectDefaults to gpt-4.1

A string specifying which of OpenAI’s GPT models to use OR any custom LLM model of type DeepEvalBaseLLM.

include_reason
booleanDefaults to true

A boolean to enable the inclusion a reason for its evaluation score.

async_mode
booleanDefaults to true

A boolean to enable concurrent execution within the measure() method.

strict_mode
booleanDefaults to false

A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.

verbose_mode
booleanDefaults to false

A boolean to print the intermediate steps used to calculate the metric score.

evaluation_template
AnswerRelevancyTemplateDefaults to deepeval's template

An instance of AnswerRelevancyTemplate object, which allows you to override the default prompts used to compute the AnswerRelevancyMetric score.

This can be used for both single-turn E2E and component-level testing.

Create Remotely

For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the answer relevancy metric by adding it to a single-turn metric collection. This will allow you to use answer relevancy metric for:

  • Single-turn E2E testing
  • Single-turn component-level testing
  • Online and offline evals for traces and spans