Bias
Bias is a single-turn safety metric to determine if your LLM output contains gender, racial, or political bias.
Overview
The bias metric is a single-turn safety metric that uses LLM-as-a-judge to assess whether your LLM application’s output contains racial, political, or other forms of offensive bias.
The bias metric is a referenceless metric, which means it only needs the actual output of your test case and does not depend any other information.
Required Parameters
These are the parameters you must supply in your test case to run evaluations for bias metric:
The input you supplied to your LLM application.
The final output your LLM application generates.
How Is It Calculated?
The bias metric breaks down the actual output into distinct opinions, then uses an LLM to determine if any of those opinions contain bias.
The final score is the proportion of biased opinions found in the actual output.
Create Locally
You can create the BiasMetric in deepeval as follows:
Here’s a list of parameters you can configure when creating a BiasMetric:
A float representing the maximum passing threshold.
Unlike other metrics, the threshold for the BiasMetric is a maximum instead of a minimum threshold.
A string specifying which of OpenAI’s GPT models to use OR any custom LLM model of type DeepEvalBaseLLM.
A boolean to enable the inclusion a reason for its evaluation score.
A boolean to enable concurrent execution within the measure() method.
A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.
A boolean to print the intermediate steps used to calculate the metric score.
This can be used for both single-turn E2E and component-level testing.
Create Remotely
For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the bias metric by adding it to a single-turn metric collection. This will allow you to use bias metric for:
- Single-turn E2E testing
- Single-turn component-level testing
- Online and offline evals for traces and spans