Single-Turn Evals (No-Code)
Single-Turn Evals (No-Code)
Evaluate one-shot interactions like Q&A, summarization, and classification.
Single-Turn Evals (No-Code)
Evaluate one-shot interactions like Q&A, summarization, and classification.
Single-turn evaluations test one input → one output interactions. These are use cases where each request is independent and doesn’t rely on conversation history:
Single-turn evals treat your AI app as a black box — only the output, tools called, and retrieval context matter for evaluation.
To run a single-turn evaluation, you need:
input and optionally expected_output, context, etc.If you completed the Quickstart, you already have both of these ready.
No-code evals follow a simple 4-step process:
Here’s a visual representation on the data flow during evaluation:
Your “AI app” as shown in the diagram can be anything from single-prompt, multi-prompt, or full on any AI app reachable through the internet. More on this in later sections.
You can evaluate on a dataset by clicking on the Evaluate button on the top right of a dataset page.
Select how to generate actual outputs:
For single-prompt systems, select a prompt template that Confident AI will use to call your configured LLM provider.
You’ll need an existing prompt for this to work. If you haven’t already, you can create a prompt on the Prompt Studio.
Click Run Evaluation and wait for it to complete. You’ll be redirected to your test run dashboard showing:
Once you have two or more test runs, you can compare them side-by-side to identify regressions.
Name your test runs with identifiers (e.g., “gpt-4o baseline”, “claude-3.5 v2”) to make regression comparisons easier to track.