Task Completion

Task Completion is a single-turn metric to determine an agent’s task completion score

Overview

The task completion metric is a single-turn metric that uses LLM-as-a-judge to assess whether your LLM agent successfully completes the given task based on its entire trace.

Important Note

The task completion analyzes your agent’s full trace to determine task success, which requires setting up tracing.

How Is It Calculated?

The task completion metric uses an LLM to extract the task and outcome from each step in the trace, then uses the same LLM to determine if the task was satisfied based on the outcome.


Task Completion=Alignment Score(Task,Outcome)\text{Task Completion} = \text{Alignment Score}(\text{Task}, \text{Outcome})

The final score is the alignment of task and outcome as extracted from the trace.

Create Locally

You can create the TaskCompletionMetric in deepeval as follows:

1from deepeval.metrics import TaskCompletionMetric
2
3metric = TaskCompletionMetric()

Here’s a list of parameters you can configure when creating a TaskCompletionMetric:

threshold
numberDefaults to 0.5

A float to represent the minimum passing threshold.

task
string

A string representing the task to be completed. If no task is supplied, it is automatically inferred from the trace.

model
string | ObjectDefaults to gpt-4.1

A string specifying which of OpenAI’s GPT models to use OR any custom LLM model of type DeepEvalBaseLLM.

include_reason
booleanDefaults to true

A boolean to enable the inclusion a reason for its evaluation score.

async_mode
booleanDefaults to true

A boolean to enable concurrent execution within the measure() method.

strict_mode
booleanDefaults to false

A boolean to enforce a binary metric score: 0 for perfection, 1 otherwise.

verbose_mode
booleanDefaults to false

A boolean to print the intermediate steps used to calculate the metric score.

Create Remotely

For users not using deepeval python, or want to run evals remotely on Confident AI, you can use the task completion metric by adding it to a single-turn metric collection. This will allow you to use task completion metric for:

  • Single-turn E2E testing
  • Online and offline evals for traces