Manage Datasets

Learn the core functions of a dataset, and ways to manipulate goldens within

Overview

A dataset, which is either single or multi-turn one, is a list of goldens and forms the basis of any evaluation workflow in development. In this section, you’ll learn to manipulate goldens in datasets, including:

  • Understanding the golden structure for single and multi-turn datasets
  • Uploading goldens via CSV on the platform
  • Assigning different team members to review and finalize goldens

If you haven’t already, you should get yourself familiarized with what are goldens.

Create A Dataset

A dataset can be created one under Project > Datasets (select either the single or multi-turn tab based on the type of dataset you wish to create):

Create Dataset on Confident AI

Golden Structure

Understanding the golden structure is essential before uploading your data. Goldens are the building blocks of datasets, and their structure differs slightly between single-turn and multi-turn evaluations:

FieldTypeDescription
InputTextRequired. The input query that will be used to invoke your AI app.
Expected OutputTextThe ideal output for a given input.
ContextList of textStatic supporting context relevant to your use case.
Expected ToolsList of toolsThe ideal list of tools that should be called.
Additional MetadataKey-value pairsCustom metadata for generating test cases.
CommentsTextAny notes or comments about this golden.

Avoid pre-populating Actual Output, Retrieval Context, or Tools Called for single-turn goldens, and Turns for multi-turn goldens. These fields are meant to be populated dynamically during evaluation.

Upload Goldens via CSV

You can upload both single and multi-turn goldens stored in CSVs to datasets. The fields that you will be mapping to CSV headers will just be slightly different.

Upload Goldens via CSV

Other Actions

Beyond creating and uploading, you can also:

  • Add Images — drag and drop images into text fields for multi-modal goldens
  • Edit Non-Text Columns — modify structured fields like Context, Expected Tools, and Tools Called
  • Add Custom Columns — extend goldens with additional metadata fields
  • Assign Goldens — delegate review to team members
  • (Un)finalize Goldens — enable or disable goldens for testing
  • Duplicate Dataset — create a copy of an existing dataset
  • Delete Dataset — permanently remove a dataset

Adding Images

Datasets on Confident AI are multi-modal by nature — images are natively supported alongside text. You can add images to goldens by dragging and dropping them directly into any text field, including Input, Expected Output, Context, and other list-of-text fields.

When you upload an image, Confident AI stores it and generates a public URL. This URL is embedded in your golden’s text fields using a special format: [DEEPEVAL:IMAGE:uuid]. When you pull the dataset for evaluation, you can parse these into an evaluatable format.

Add Images to Goldens

Learn how to parse multi-modal goldens into an evaluatable format when pulling datasets in code.

Edit Non-Text Columns

Some golden fields require structured data rather than plain text. This is mostly relevant for single-turn datasets — multi-turn datasets only have Context.

FieldTypeDescription
ContextList of stringsStatic supporting context for your use case
Retrieval ContextList of stringsRetrieved text chunks from a retrieval system
Expected ToolsList of ToolCallThe ideal tools that should be called
Tools CalledList of ToolCallThe actual tools that were called during execution

A ToolCall object has the following structure:

1{
2 "name": "get_weather",
3 "description": "Get weather for a location",
4 "reasoning": "User asked about the weather in San Francisco",
5 "output": "Sunny, 72°F",
6 "input_parameters": { "location": "San Francisco" }
7}
Edit Non-Text Columns

Add Custom Columns

Add custom columns to your dataset to store additional metadata. Custom columns appear as new fields on each golden and can be used for passing dynamic values during evaluation.

Your custom columns must not be one of the default fields:

  • Input
  • Expected Output
  • Context
  • Expected Tools
  • Additional Metadata
  • Comments
  • Actual Output
  • Retrieval Context
  • Tools Called
Add Custom Column

Assign Goldens

Assign goldens to different team members for review and annotation.

Assign Goldens for Annotation

(Un)finalize Goldens

Mark goldens as finalized to lock them from further edits, or unfinalize to allow changes. Finalizing is useful when you’ve reviewed and approved goldens for use in evaluations.

Duplicate Dataset

Create a copy of an existing dataset. Useful when you want to create variations or preserve a snapshot before making changes.

Duplicate Dataset

Delete Dataset

Remove a dataset permanently on the platform:

Delete Dataset on Confident AI

This action cannot be undone. All goldens or conversational goldens in the dataset will be permanently deleted.

Next Steps

Now that you know how to manage datasets on the platform, learn how to use them for evaluations or work with them programmatically in your code.