Turn Relevancy | Confident AI Docs

Overview

The turn relevancy metric is a multi-turn metric that uses LLM-as-a-judge to evaluate whether your chatbot’s responses are relevant to the corresponding user inputs at each turn in the conversation.

Required Parameters

These are the parameters you must supply in your test case to run evaluations for turn relevancy metric:

turns

list of TurnRequired

A list of Turns as exchanges between user and assistant.

Parameters of Turn:

role

user | assistantRequired

The role of the person speaking, it’s either user or assistant

content

stringRequired

The content provided by the role for the turn

How Is It Calculated?

The turn relevancy metric loops over all the turns to find the assistant turns and uses an LLM to see if the corresponding turn’s content is relevant to the previous user turn’s content.

\text{Turn Relevancy} = \frac{\text{Number of Assistant Turns with Relevant Assistant Content}}{\text{Total Number of Assistant Turns}}

The final score is the proportion of assisant turns that give relevant output in the conversation.

Create Locally

You can create the TurnRelevancyMetric in deepeval as follows:

1 from deepeval.metrics import TurnRelevancyMetric
2 
3 metric = TurnRelevancyMetric()

Here’s a list of parameters you can configure when creating a TurnRelevancyMetric:

threshold

numberDefaults to 0.5

A float to represent the minimum passing threshold.

window_size

numberDefaults to 10

An integer which defines the size of the sliding window of turns used during evaluation.

model

string | ObjectDefaults to gpt-4.1

A string specifying which of OpenAI’s GPT models to use OR any custom LLM model of type DeepEvalBaseLLM.

include_reason

booleanDefaults to true

A boolean to enable the inclusion a reason for its evaluation score.

async_mode

booleanDefaults to true

A boolean to enable concurrent execution within the measure() method.

strict_mode

booleanDefaults to false

A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.

verbose_mode

booleanDefaults to false

A boolean to print the intermediate steps used to calculate the metric score.

This can be used for multi-turn E2E

Create Remotely

For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the turn relevancy metric by adding it to a single-turn metric collection. This will allow you to use turn relevancy metric for:

Multi-turn E2E testing
Online and offline evals for traces and spans