Manage Datasets | Confident AI Docs

Overview

A dataset, which is either single or multi-turn one, is a list of goldens and forms the basis of any evaluation workflow in development. In this section, you’ll learn to manipulate goldens in datasets, including:

Understanding the golden structure for single and multi-turn datasets
Uploading goldens via CSV on the platform
Assigning different team members to review and finalize goldens

If you haven’t already, you should get yourself familiarized with what are goldens.

Create A Dataset

A dataset can be created one under Project > Datasets (select either the single or multi-turn tab based on the type of dataset you wish to create):

Create Dataset on Confident AI

Golden Structure

Understanding the golden structure is essential before uploading your data. Goldens are the building blocks of datasets, and their structure differs slightly between single-turn and multi-turn evaluations:

Single-Turn

Multi-Turn

Field	Type	Description
Input	Text	Required. The input query that will be used to invoke your AI app.
Expected Output	Text	The ideal output for a given input.
Context	List of text	Static supporting context relevant to your use case.
Expected Tools	List of tools	The ideal list of tools that should be called.
Additional Metadata	Key-value pairs	Custom metadata for generating test cases.
Comments	Text	Any notes or comments about this golden.

Avoid pre-populating Actual Output, Retrieval Context, or Tools Called for single-turn goldens, and Turns for multi-turn goldens. These fields are meant to be populated dynamically during evaluation.

Upload Goldens via CSV

You can upload both single and multi-turn goldens stored in CSVs to datasets. The fields that you will be mapping to CSV headers will just be slightly different.

Upload Goldens via CSV

Other Actions

Beyond creating and uploading, you can also:

Add Images — drag and drop images into text fields for multi-modal goldens
Edit Non-Text Columns — modify structured fields like Context, Expected Tools, and Tools Called
Add Custom Columns — extend goldens with additional metadata fields
Assign Goldens — delegate review to team members
(Un)finalize Goldens — enable or disable goldens for testing
Duplicate Dataset — create a copy of an existing dataset
Delete Dataset — permanently remove a dataset

Adding Images

Datasets on Confident AI are multi-modal by nature — images are natively supported alongside text. You can add images to goldens by dragging and dropping them directly into any text field, including Input, Expected Output, Context, and other list-of-text fields.

When you upload an image, Confident AI stores it and generates a public URL. This URL is embedded in your golden’s text fields using a special format: [DEEPEVAL:IMAGE:uuid]. When you pull the dataset for evaluation, you can parse these into an evaluatable format.

Learn how to parse multi-modal goldens into an evaluatable format when pulling datasets in code.

Edit Non-Text Columns

Some golden fields require structured data rather than plain text. This is mostly relevant for single-turn datasets — multi-turn datasets only have Context.

Field	Type	Description
Context	List of strings	Static supporting context for your use case
Retrieval Context	List of strings	Retrieved text chunks from a retrieval system
Expected Tools	List of `ToolCall`	The ideal tools that should be called
Tools Called	List of `ToolCall`	The actual tools that were called during execution

A ToolCall object has the following structure:

1 {
2   "name": "get_weather",
3   "description": "Get weather for a location",
4   "reasoning": "User asked about the weather in San Francisco",
5   "output": "Sunny, 72°F",
6   "input_parameters": { "location": "San Francisco" }
7 }

Add Custom Columns

Add custom columns to your dataset to store additional metadata. Custom columns appear as new fields on each golden and can be used for passing dynamic values during evaluation.

Your custom columns must not be one of the default fields:

Single-Turn

Multi-Turn

Input
Expected Output
Context
Expected Tools
Additional Metadata
Comments
Actual Output
Retrieval Context
Tools Called

Add Custom Column

Assign Goldens

Assign goldens to different team members for review and annotation.

Assign Goldens for Annotation

(Un)finalize Goldens

Mark goldens as finalized to lock them from further edits, or unfinalize to allow changes. Finalizing is useful when you’ve reviewed and approved goldens for use in evaluations.

Duplicate Dataset

Create a copy of an existing dataset. Useful when you want to create variations or preserve a snapshot before making changes.

Delete Dataset

Remove a dataset permanently on the platform:

Delete Dataset on Confident AI

This action cannot be undone. All goldens or conversational goldens in the dataset will be permanently deleted.

Next Steps

Now that you know how to manage datasets on the platform, learn how to use them for evaluations or work with them programmatically in your code.

Experiments

Use datasets to compare AI apps side-by-side with statistical rigor.

Single-Turn Evals

Run evaluations on your dataset without writing code.

Pull Datasets

Pull datasets locally to use them in code-driven evaluations.

Automate Goldens in Code

Programmatically push goldens to datasets via the Evals API.