Push, Queue, and Annotate Goldens

Overview

A dataset, single or multi-turn one, is a list of goldens and forms the basis of any evaluation workflow in development. In this section, you’ll learn to manipulate goldens in datasets, including:

Uploading goldens via CSV on the platform
Build an automated golden ingestion pipline via the Evals API
Assiging different team members to review and finalize goldens

If you haven’t already, you should get yourself familiarized with what are goldens.

If you haven’t already, create one under Project > Datasets:

Create Dataset on Confident AI

Upload Goldens via CSV

You can upload both single and multi-turn goldens stored in CSVs to datasets. The fields that you will be mapping to CSV headers will just be slightly different.

Upload Goldens via CSV

Manage Goldens via Evals API

If you wish to upload goldens programmatically instead, you can leverage Confident AI’s Evals API. You can either push goldens in the “finalized” state, or queue them to mark them “unfinalized”.

Only finalized goldens will be pulled for evaluation.

You can push or manage datasets in any project by configuring a CONFIDENT_API_KEY.

For default usage, set CONFIDENT_API_KEY as an environment variable.
To target a specific project, pass a confident_api_key directly when creating the EvaluationDataset.

1 from deepeval.dataset import EvaluationDataset
2 
3 dataset = EvaluationDataset(confident_api_key="confident_us...")
4 dataset.delete(alias="YOUR-DATASET-ALIAS")

When both are provided, the confident_api_key passed to EvaluationDataset always takes precedence over the environment variable.

Push goldens

If the dataset does not already exist, Confident AI will create it for you.

Python

Typescript

curL

For single-turn datasets, push single-turn goldens:

main.py

1 from deepeval.dataset import EvaluationDataset, Golden
2 
3 goldens = [Golden(input="How tall is Mr. Everest?")]
4 dataset = EvaluationDataset(goldens=goldens)
5 
6 dataset.push(alias="YOUR-DATASET-ALIAS")

For multi-turn datasets, push multi-turn goldens:

With Turns

Without Turns

main.py

1 from deepeval.dataset import EvaluationDataset, ConversationalGolden
2 from deepeval.test_case import Turn
3 
4 goldens = [
5 ConversationalGolden(
6 scenario="Angry user asking for a refund.",
7 turns=[Turn(role="user", content="Give me my money!")]
8 )
9 ]
10 dataset = EvaluationDataset(goldens=goldens)
11 
12 dataset.push(alias="YOUR-DATASET-ALIAS")

Queue goldens

If the dataset does not already exist, Confident AI will create it for you.

Python

Typescript

curL

For single-turn datasets, queue single-turn goldens:

main.py

1 from deepeval.dataset import EvaluationDataset, Golden
2 
3 goldens = [Golden(input="How tall is Mr. Everest?")]
4 dataset = EvaluationDataset()
5 
6 dataset.queue(alias="YOUR-DATASET-ALIAS", goldens=goldens)

For multi-turn datasets, queue multi-turn goldens:

With Turns

Without Turns

main.py

1 from deepeval.dataset import EvaluationDataset, ConversationalGolden
2 
3 goldens = [
4 ConversationalGolden(
5 scenario="Angry user asking for a refund.",
6 turns=[Turn(role="user", content="Give me my money!")]
7 )
8 ]
9 dataset = EvaluationDataset()
10 
11 dataset.queue(alias="YOUR-DATASET-ALIAS", goldens=goldens)

Custom Dataset Columns

You can add custom columns to a dataset to hold additional additation data for each golden as long as they don’t clash with any of the existing default field names (e.g. “Input”, “Actual Output”, etc.)

Create Custom Columns

You can also do it through the Evals API when pushing or queueing goldens by including the custom column key values field in single/multi-turn goldens:

Python

Typescript

curL

main.py

1 from deepeval.dataset import Golden, ConversationalGolden
2 
3 golden = Golden(custom_column_key_values={"Key": "Value"}, ...)
4 multiturn_golden = ConversationalGolden(custom_column_key_values={"Key": "Value"}, ...)

Assign Goldens For Annotation

You can also assign goldens to different team members for review and annotation.

Assign Goldens for Annotation

Delete Dataset

You can delete a dataset directly on the platform.

This action cannot be undone. All goldens or conversational goldens in the dataset will be permanently deleted.

Delete Dataset on Confident AI

You can also do it programmatically through the Evals API.

Python

Typescript

curL

main.py

1 from deepeval.dataset import EvaluationDataset
2 
3 dataset = EvaluationDataset()
4 dataset.delete(alias="YOUR-DATASET-ALIAS")