Introduction

Learn how domain experts can contribute to AI testing

Overview

Confident AI offers end users and internal annotators to leave human annotations on traces, spans, and threads monitored. Confident AI provides a centralized place for even non-technical teams to:

  • Annotate datasets
  • Keep track of end user feedback
  • Align metrics with human judgement
  • Leave annotations for other stakeholders to internall review

Without real humans giving feedback to an LLM system, evals are no better than vibe-coding.

Human-in-the-loop is one of the most important workflows in an LLM evaluation pipeline. This is because LLM evals automate and scale human judgements, and not replaces them.

Two-Rating System

You can mix and match two rating systems on Confident AI:

Thums Up/Down

Either 0 or 1, nothing else.

Five Star Rating

Ranges from 1 - 5, inclusive.

You’ll learn how to configure both rating systems via the Evals API or UI in the following sections.

Two Ways to Leave Annotations

There are two ways to leave annotations on Confident AI:

Single vs Multi-Turn

Single-turn annotation refers to an annotation that is left on a trace or span, while multi-turn refers to annotations on a thread. The only difference between a single and multi-turn annotation is single-turn annotation accepts an optional expected output, while a multi-turn one accepts an optional expected outcome.