Role Adherence

Role Adherence is a multi-turn metric to determine if your chatbot adheres to a specified role

Overview

The role adherence metric is a multi-turn metric that uses LLM-as-a-judge to evaluate whether your chatbot consistently maintains its pre-determined role throughout the conversation.

Required Parameters

These are the parameters you must supply in your test case to run evaluations for role adherence metric:

chatbot_role
stringRequired

The role your chatbot has to adhere to throughtout the conversation.

turns
list of TurnRequired

A list of Turns as exchanges between user and assistant.

Parameters of Turn:

role
user | assistantRequired

The role of the person speaking, it’s either user or assistant

content
stringRequired

The content provided by the role for the turn

How Is It Calculated?

The role adherence metric iterates over each assistant turn and uses an LLM to check if the content adheres to the specified chatbot_role.


Role Adherence=Number of Assistant Turns that Adhered to Chatbot Role in ConversationTotal Number of Assistant Turns in Conversation\text{Role Adherence} = \frac{\text{Number of Assistant Turns that Adhered to Chatbot Role in Conversation}}{\text{Total Number of Assistant Turns in Conversation}}

The final score is the proportion of assisant turns that adhere to the role specified in the conversation.

Create Locally

You can create the RoleAdherenceMetric in deepeval as follows:

1from deepeval.metrics import RoleAdherenceMetric
2
3metric = RoleAdherenceMetric()

Here’s a list of parameters you can configure when creating a RoleAdherenceMetric:

threshold
numberDefaults to 0.5

A float to represent the minimum passing threshold.

model
string | ObjectDefaults to gpt-4.1

A string specifying which of OpenAI’s GPT models to use OR any custom LLM model of type DeepEvalBaseLLM.

include_reason
booleanDefaults to true

A boolean to enable the inclusion a reason for its evaluation score.

async_mode
booleanDefaults to true

A boolean to enable concurrent execution within the measure() method.

strict_mode
booleanDefaults to false

A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.

verbose_mode
booleanDefaults to false

A boolean to print the intermediate steps used to calculate the metric score.

This can be used for multi-turn E2E

Create Remotely

For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the knowledge retention metric by adding it to a single-turn metric collection. This will allow you to use knowledge retention metric for:

  • Multi-turn E2E testing
  • Online and offline evals for traces and spans