Automate Dataset Management

Programmatically push goldens to datasets via the Evals API.

Overview

This section covers how to programmatically manage goldens in datasets using the Evals API:

  • Push single and multi-turn goldens to datasets
  • Set finalized=True to make goldens available for evaluation, or finalized=False to queue for review
  • Include custom column values when pushing goldens
  • Delete datasets programmatically
Only finalized goldens will be pulled for evaluation.

Push Goldens

Push goldens to a dataset. If the dataset does not already exist, Confident AI will create it for you.

For single-turn datasets:

main.py
1from deepeval.dataset import EvaluationDataset, Golden
2
3goldens = [Golden(input="How tall is Mt. Everest?")]
4dataset = EvaluationDataset(goldens=goldens)
5
6# Push as finalized (ready for evaluation)
7dataset.push(alias="YOUR-DATASET-ALIAS", finalized=True)
8
9# Or push as unfinalized (queued for review)
10dataset.push(alias="YOUR-DATASET-ALIAS", finalized=False)

For multi-turn datasets:

main.py
1from deepeval.dataset import EvaluationDataset, ConversationalGolden
2from deepeval.test_case import Turn
3
4goldens = [
5ConversationalGolden(
6scenario="Angry user asking for a refund.",
7turns=[Turn(role="user", content="Give me my money!")]
8)
9]
10dataset = EvaluationDataset(goldens=goldens)
11
12dataset.push(alias="YOUR-DATASET-ALIAS", finalized=True)

Add Custom Columns

You can include custom column values when pushing goldens. Custom columns must already exist on the dataset, or Confident AI will create them for you.

main.py
1from deepeval.dataset import Golden, ConversationalGolden
2
3golden = Golden(
4 input="How tall is Mt. Everest?",
5 custom_column_key_values={"difficulty": "easy", "category": "geography"}
6)
7
8multiturn_golden = ConversationalGolden(
9 scenario="User asking for a refund.",
10 custom_column_key_values={"sentiment": "angry", "priority": "high"}
11)

Versioning Datasets

Datasets support immutable, named versions so you can pin evaluation runs to a specific snapshot of goldens.

  • Create a version to snapshot the current state of the dataset.
  • Push without specifying version to add goldens to the latest version (or unversioned, if the dataset has no versions yet).
  • Push with version=... to add goldens to a specific version.
  • Pull without version to read the latest version. Pull with version=... to read a specific version.
  • Get versions to list all snapshots, newest first.

Create a version

main.py
1from deepeval.dataset import EvaluationDataset
2
3dataset = EvaluationDataset()
4version = dataset.create_version(alias="YOUR-DATASET-ALIAS")
5# version -> "00.00.01"

The first call to create_version backfills every existing unversioned golden onto the new version. Subsequent calls snapshot all goldens from the previous version (with new IDs) and auto-increment the version number.

List versions

main.py
1from deepeval.dataset import EvaluationDataset
2
3dataset = EvaluationDataset()
4versions = dataset.get_versions(alias="YOUR-DATASET-ALIAS")
5for v in versions:
6 print(v.version, v.id)

Push and pull a specific version

main.py
1from deepeval.dataset import EvaluationDataset, Golden
2
3dataset = EvaluationDataset(goldens=[Golden(input="...", expected_output="...")])
4
5# Push goldens onto version 00.00.01
6dataset.push(alias="YOUR-DATASET-ALIAS", version="00.00.01")
7
8# Pull a specific version
9dataset.pull(alias="YOUR-DATASET-ALIAS", version="00.00.01")
10print(dataset._version) # -> "00.00.01"

When version is omitted, push and pull operate on the latest version. If the dataset has no versions yet, push leaves goldens unversioned and pull returns those unversioned goldens with version: null.

Delete Dataset

Delete a dataset programmatically via the Evals API.

This action cannot be undone. All goldens or conversational goldens in the dataset will be permanently deleted.

main.py
1from deepeval.dataset import EvaluationDataset
2
3dataset = EvaluationDataset()
4dataset.delete(alias="YOUR-DATASET-ALIAS")

Switching Projects

You can push or manage datasets in any project by configuring a CONFIDENT_API_KEY.

  • For default usage, set CONFIDENT_API_KEY as an environment variable.
  • To target a specific project, pass a confident_api_key directly when creating the EvaluationDataset.
main.py
1from deepeval.dataset import EvaluationDataset
2
3dataset = EvaluationDataset(confident_api_key="confident_us...")

When both are provided, the confident_api_key passed to EvaluationDataset always takes precedence over the environment variable.

Next Steps

Now that you know how to push goldens, learn how to pull them for evaluation.