Introduction
Overview
Confident AI offers end users and internal annotators to leave human annotations on traces, spans, and threads monitored. Confident AI provides a centralized place for even non-technical teams to:
- Annotate datasets
- Keep track of end user feedback
- Align metrics with human judgement
- Leave annotations for other stakeholders to internall review
Without real humans giving feedback to an LLM system, evals are no better than vibe-coding.
Human-in-the-loop is one of the most important workflows in an LLM evaluation pipeline. This is because LLM evals automate and scale human judgements, and not replaces them.
Two-Rating System
You can mix and match two rating systems on Confident AI:
Either 0 or 1, nothing else.
Ranges from 1 - 5, inclusive.
You’ll learn how to configure both rating systems via the Evals API or UI in the following sections.
Two Ways to Leave Annotations
There are two ways to leave annotations on Confident AI:
- Must be sent through Evals API
- Python and Typescript DeepEval available
Suitable for: Those with custom feedback collection UIs that are user facing
- Can only be left on the UI
- Available on traces, spans, and threads
Suitable for: Internal domain experts, QA teams, PMs, that need to surface judgements to stakeholders
Single vs Multi-Turn
Single-turn annotation refers to an annotation that is left on a trace or span, while multi-turn refers to annotations on a thread. The only difference between a single and multi-turn annotation is single-turn annotation accepts an optional expected output, while a multi-turn one accepts an optional expected outcome.