Contextual Relevancy
Contextual Relevancy is a single-turn metric used to evaluate a RAG retriever
Overview
The contextual relevancy metric is a single-turn RAG metric that uses LLM-as-a-judge to evaluate whether all retrieved context is relevant to the input query.
The input of a test case should not contain the entire prompt, but just the query when using the contextual relevancy metric.
Required Parameters
These are the parameters you must supply in your test case to run evaluations for contextual relevancy metric:
The input query you supply to your RAG application.
The expected output your RAG application has to generate for a given input.
The retrieved context your retriever outputs for a given input sorted by their rank.
How Is It Calculated?
The contextual relevancy metric first extracts independent statements from all retrieved context using an LLM, then uses the same LLM to determine how many of those statements are relevant to the input query.
The final score is the proportion of relevant statements in retrieval context.
Create Locally
You can create the ContextualRelevancyMetric in deepeval as follows:
Here’s a list of parameters you can configure when creating a ContextualRelevancyMetric:
A float to represent the minimum passing threshold.
A string specifying which of OpenAI’s GPT models to use OR any custom LLM
model of type
DeepEvalBaseLLM.
A boolean to enable the inclusion a reason for its evaluation score.
A boolean to enable concurrent execution within the measure() method.
A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.
A boolean to print the intermediate steps used to calculate the metric score.
An instance of ContextualRelevancyTemplate object, which allows you to
override the default prompts used to compute the ContextualRelevancyMetric
score.
This can be used for both single-turn E2E and component-level testing.
Create Remotely
For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the contextual relevancy metric by adding it to a single-turn metric collection. This will allow you to use contextual relevancy metric for:
- Single-turn E2E testing
- Single-turn component-level testing
- Online and offline evals for traces and spans