Contextual Precision | Confident AI Docs

Overview

The contextual precision metric is a single-turn RAG metric that uses LLM-as-a-judge to evaluate how well your retriever ranks the retrieved context based on the input query.

The input of a test case should not contain the entire prompt, but just the query when using the contextual precision metric.

Required Parameters

These are the parameters you must supply in your test case to run evaluations for contextual precision metric:

input

stringRequired

The input query you supply to your RAG application.

expected_output

stringRequired

The expected output your RAG application has to generate for a given input.

retrieval_context

list of stringRequired

The retrieved context your retriever outputs for a given input sorted by their rank.

How Is It Calculated?

The contextual precision metric evaluates each retrieved node using an LLM to check if it is correctly ranked for relevance to the input. It then calculates the final score using the following equation:

\text{Contextual Precision} = \frac{1}{\text{Num of Relevant Nodes}}\sum_{k=1}^{n}(\frac{\text{Num of Relevant Nodes Upto position k}}{k} \times r_k)

k - i+1th node in the retrieval context

n - number of nodes in the retrieval context

rₖ - the binary relevance of the kth node. 1 if relevant, 0 otherwise.

A high contextual precison score indicates that all the retrieved nodes are in the order of their relevance to the input.

Create Locally

You can create the ContextualPrecisionMetric in deepeval as follows:

1 from deepeval.metrics import ContextualPrecisionMetric
2 
3 metric = ContextualPrecisionMetric()

Here’s a list of parameters you can configure when creating a ContextualPrecisionMetric:

threshold

numberDefaults to 0.5

A float to represent the minimum passing threshold.

model

string | ObjectDefaults to gpt-4.1

A string specifying which of OpenAI’s GPT models to use OR any custom LLM model of type DeepEvalBaseLLM.

include_reason

booleanDefaults to true

A boolean to enable the inclusion a reason for its evaluation score.

async_mode

booleanDefaults to true

A boolean to enable concurrent execution within the measure() method.

strict_mode

booleanDefaults to false

A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.

verbose_mode

booleanDefaults to false

A boolean to print the intermediate steps used to calculate the metric score.

evaluation_template

ContextualPrecisionTemplateDefaults to deepeval's template

An instance of ContextualPrecisionTemplate object, which allows you to override the default prompts used to compute the ContextualPrecisionMetric score.

This can be used for both single-turn E2E and component-level testing.

Create Remotely

For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the contextual precision metric by adding it to a single-turn metric collection. This will allow you to use contextual precision metric for:

Single-turn E2E testing
Single-turn component-level testing
Online and offline evals for traces and spans