For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Trust CenterStatusSupportGet a demoPlatform
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingChangelog
DocumentationEvals API ReferenceIntegrations & OTELPlatform SettingsSelf-HostingChangelog
  • Get Started
    • Introduction
    • Setup and Installation
  • LLM Evaluation
    • Introduction
    • Experiments
      • Manage Datasets
  • Metrics
    • Introduction
    • Metric Collections
    • Custom Metrics
  • LLM Tracing
    • Introduction
    • Signals
    • Troubleshooting
  • Human-in-the-Loop
    • Introduction
    • Collect Feedback
  • Reporting & Analytics
    • Dashboards
    • Executive Insights
  • Red Teaming
    • Introduction
    • Quickstart
    • Frameworks & Policies
    • Risk Profiles
    • Red Team Using DeepTeam
  • Resources
    • Why Confident AI
    • Support
    • Data Handling
    • LLM Use Cases
LogoLogo
Trust CenterStatusSupportGet a demoPlatform
On this page
  • Overview
  • Create A Dataset
  • Golden Structure
  • Upload Goldens via CSV
  • Other Actions
  • Adding Images
  • Edit Non-Text Columns
  • Add Custom Columns
  • Assign Goldens
  • (Un)finalize Goldens
  • Duplicate Dataset
  • Delete Dataset
  • Schedule Dataset Evals
  • Next Steps
LLM EvaluationDatasets

Manage Datasets

Learn the core functions of a dataset, and ways to manipulate goldens within
Was this page helpful?
Previous

Introduction

Generate synthetic goldens from your own data sources
Next
Built with

Overview

A dataset, which is either single or multi-turn one, is a list of goldens and forms the basis of any evaluation workflow in development. In this section, you’ll learn to manipulate goldens in datasets, including:

  • Understanding the golden structure for single and multi-turn datasets
  • Uploading goldens via CSV on the platform
  • Assigning different team members to review and finalize goldens

If you haven’t already, you should get yourself familiarized with what are goldens.

Create A Dataset

A dataset can be created one under Project > Datasets (select either the single or multi-turn tab based on the type of dataset you wish to create):

Create Dataset on Confident AI

Golden Structure

Understanding the golden structure is essential before uploading your data. Goldens are the building blocks of datasets, and their structure differs slightly between single-turn and multi-turn evaluations:

Single-Turn
Multi-Turn
FieldTypeDescription
InputTextRequired. The input query that will be used to invoke your AI app.
Expected OutputTextThe ideal output for a given input.
ContextList of textStatic supporting context relevant to your use case.
Expected ToolsList of toolsThe ideal list of tools that should be called.
Additional MetadataKey-value pairsCustom metadata for generating test cases.
CommentsTextAny notes or comments about this golden.

Avoid pre-populating Actual Output, Retrieval Context, or Tools Called for single-turn goldens, and Turns for multi-turn goldens. These fields are meant to be populated dynamically during evaluation.

Upload Goldens via CSV

You can upload both single and multi-turn goldens stored in CSVs to datasets. The fields that you will be mapping to CSV headers will just be slightly different.

Upload Goldens via CSV

Other Actions

Beyond creating and uploading, you can also:

  • Add Images — drag and drop images into text fields for multi-modal goldens
  • Edit Non-Text Columns — modify structured fields like Context, Expected Tools, and Tools Called
  • Add Custom Columns — extend goldens with additional metadata fields
  • Assign Goldens — delegate review to team members
  • (Un)finalize Goldens — enable or disable goldens for testing
  • Duplicate Dataset — create a copy of an existing dataset
  • Delete Dataset — permanently remove a dataset

Adding Images

Datasets on Confident AI are multi-modal by nature — images are natively supported alongside text. You can add images to goldens by dragging and dropping them directly into any text field, including Input, Expected Output, Context, and other list-of-text fields.

When you upload an image, Confident AI stores it and generates a public URL. This URL is embedded in your golden’s text fields using a special format: [DEEPEVAL:IMAGE:uuid]. When you pull the dataset for evaluation, you can parse these into an evaluatable format.

Add Images to Goldens

Learn how to parse multi-modal goldens into an evaluatable format when pulling datasets in code.

Edit Non-Text Columns

Some golden fields require structured data rather than plain text. This is mostly relevant for single-turn datasets — multi-turn datasets only have Context.

FieldTypeDescription
ContextList of stringsStatic supporting context for your use case
Retrieval ContextList of stringsRetrieved text chunks from a retrieval system
Expected ToolsList of ToolCallThe ideal tools that should be called
Tools CalledList of ToolCallThe actual tools that were called during execution

A ToolCall object has the following structure:

1{
2 "name": "get_weather",
3 "description": "Get weather for a location",
4 "reasoning": "User asked about the weather in San Francisco",
5 "output": "Sunny, 72°F",
6 "input_parameters": { "location": "San Francisco" }
7}
Edit Non-Text Columns

Add Custom Columns

Add custom columns to your dataset to store additional metadata. Custom columns appear as new fields on each golden and can be used for passing dynamic values during evaluation.

Your custom columns must not be one of the default fields:

Single-Turn
Multi-Turn
  • Input
  • Expected Output
  • Context
  • Expected Tools
  • Additional Metadata
  • Comments
  • Actual Output
  • Retrieval Context
  • Tools Called
Add Custom Column

Assign Goldens

Assign goldens to different team members for review and annotation.

Assign Goldens for Annotation

(Un)finalize Goldens

Mark goldens as finalized to lock them from further edits, or unfinalize to allow changes. Finalizing is useful when you’ve reviewed and approved goldens for use in evaluations.

Duplicate Dataset

Create a copy of an existing dataset. Useful when you want to create variations or preserve a snapshot before making changes.

Duplicate Dataset

Delete Dataset

Remove a dataset permanently on the platform:

Delete Dataset on Confident AI

This action cannot be undone. All goldens or conversational goldens in the dataset will be permanently deleted.

Schedule Dataset Evals

Confident AI allows you to schedule automated evals on your datasets. Here’s how you can schedule automated evals for your datasets:

1

Choose a Dataset

  1. Navigate to the Datasets tab in the sidebar
  2. Choose any single-turn or multi-turn dataset you wish to schedule evals for

You’ll be redirected to the dataset editor page where you can review and edit your dataset and it’s goldens.

2

Create a Schedule

  1. Navigate to the Automations tab in the sidebar.
  2. Click Add Schedule and choose your configuration
  3. Click Create Schedule.
Creating a dataset eval schedule on Confident AI

This will now create a schedule with the specified configuration and run the evals for you with the same configuration at every X interval you’ve specified in the configuration.

Next Steps

Now that you know how to manage datasets on the platform, learn how to use them for evaluations or work with them programmatically in your code.

Experiments

Use datasets to compare AI apps side-by-side with statistical rigor.

Single-Turn Evals

Run evaluations on your dataset without writing code.

Pull Datasets

Pull datasets locally to use them in code-driven evaluations.

Automate Goldens in Code

Programmatically push goldens to datasets via the Evals API.