Introduction | Confident AI Docs

Overview

Confident AI offers end users and internal annotators to leave human annotations on traces, spans, and threads monitored. Confident AI provides a centralized place for even non-technical teams to:

Annotate datasets
Keep track of end user feedback
Align metrics with human judgement
Leave annotations for other stakeholders to internall review

Without real humans giving feedback to an LLM system, evals are no better than vibe-coding.

Human-in-the-loop is one of the most important workflows in an LLM evaluation pipeline. This is because LLM evals automate and scale human judgements, and not replaces them.

Two-Rating System

You can mix and match two rating systems on Confident AI:

Thums Up/Down

Either 0 or 1, nothing else.

Five Star Rating

Ranges from 1 - 5, inclusive.

You’ll learn how to configure both rating systems via the Evals API or UI in the following sections.

Two Ways to Leave Annotations

There are two ways to leave annotations on Confident AI:

End-User Feedback

Must be sent through Evals API
Python and Typescript DeepEval available

Suitable for: Those with custom feedback collection UIs that are user facing

Internal Feedback

Can only be left on the UI
Available on traces, spans, and threads

Suitable for: Internal domain experts, QA teams, PMs, that need to surface judgements to stakeholders

Single vs Multi-Turn

Single-turn annotation refers to an annotation that is left on a trace or span, while multi-turn refers to annotations on a thread. The only difference between a single and multi-turn annotation is single-turn annotation accepts an optional expected output, while a multi-turn one accepts an optional expected outcome.