Push, Queue, and Annotate Goldens

Learn the core functions of a dataset, and ways to manipulate goldens within

Overview

A dataset, single or multi-turn one, is a list of goldens and forms the basis of any evaluation workflow in development. In this section, you’ll learn to manipulate goldens in datasets, including:

  • Uploading goldens via CSV on the platform
  • Build an automated golden ingestion pipline via the Evals API
  • Assiging different team members to review and finalize goldens

If you haven’t already, you should get yourself familiarized with what are goldens.

If you haven’t already, create one under Project > Datasets:

Create Dataset on Confident AI

Upload Goldens via CSV

You can upload both single and multi-turn goldens stored in CSVs to datasets. The fields that you will be mapping to CSV headers will just be slightly different.

Upload Goldens via CSV

Manage Goldens via Evals API

If you wish to upload goldens programmatically instead, you can leverage Confident AI’s Evals API. You can either push goldens in the “finalized” state, or queue them to mark them “unfinalized”.

Only finalized goldens will be pulled for evaluation.

Push goldens

If the dataset does not already exist, Confident AI will create it for you.

For single-turn datasets, push single-turn goldens:

main.py
1from deepeval.dataset import EvaluationDataset, Golden
2
3goldens = [Golden(input="How tall is Mr. Everest?")]
4dataset = EvaluationDataset(goldens=goldens)
5
6dataset.push(alias="YOUR-DATASET-ALIAS")

For multi-turn datasets, push multi-turn goldens:

main.py
1from deepeval.dataset import EvaluationDataset, ConversationalGolden
2from deepeval.test_case import Turn
3
4goldens = [
5 ConversationalGolden(
6 scenario="Angry user asking for a refund.",
7 turns=[Turn(role="user", content="Give me my money!")]
8 )
9]
10dataset = EvaluationDataset(goldens=goldens)
11
12dataset.push(alias="YOUR-DATASET-ALIAS")

Queue goldens

If the dataset does not already exist, Confident AI will create it for you.

For single-turn datasets, queue single-turn goldens:

main.py
1from deepeval.dataset import EvaluationDataset, Golden
2
3goldens = [Golden(input="How tall is Mr. Everest?")]
4dataset = EvaluationDataset()
5
6dataset.queue(alias="YOUR-DATASET-ALIAS", goldens=goldens)

For multi-turn datasets, queue multi-turn goldens:

main.py
1from deepeval.dataset import EvaluationDataset, ConversationalGolden
2
3goldens = [
4 ConversationalGolden(
5 scenario="Angry user asking for a refund.",
6 turns=[Turn(role="user", content="Give me my money!")]
7 )
8]
9dataset = EvaluationDataset()
10
11dataset.queue(alias="YOUR-DATASET-ALIAS", goldens=goldens)

Custom Dataset Columns

You can add custom columns to a dataset to hold additional additation data for each golden as long as they don’t clash with any of the existing default field names (e.g. “Input”, “Actual Output”, etc.)

Create Custom Columns

You can also do it through the Evals API when pushing or queueing goldens by including the custom column key values field in single/multi-turn goldens:

main.py
1from deepeval.dataset import Golden, ConversationalGolden
2
3golden = Golden(custom_column_key_values={"Key": "Value"}, ...)
4multiturn_golden = ConversationalGolden(custom_column_key_values={"Key": "Value"}, ...)

Assign Goldens For Annotation

You can also assign goldens to different team members for review and annotation.

Assign Goldens for Annotation