Pull Datasets for Testing
Overview
In the previous section, we learnt how to curate datasets by manging goldens through the platform directly or via Confident’s Evals API. In this section, we will learn how to:
- Pull dataests for LLM testing
- Associate dataset with test runs
- Use the
evals_iteratorto run evals on datasets (for python users)
Pull Goldens via Evals API
Datasets are either single or multi-turn, and you should know that pulling a single-turn dataset will give you single-turn goldens, and vice versa.
You will be responsible for mapping single-turn goldens to single-turn test cases, and vice versa.
Pulling goldens via the Evals API will only pull finalized goldens by default.
Python
Typescript
curL
Using Evals Iterator
Typeically, you would just provide your dataset as a list of test cases for evaluatioin. However, if you’re running single-turn, end-to-end OR component-level evaluations, and is using deepeval in Python, you can use the evals_iterator() instead:
You’ll need to trace your LLM app to make this work. Read this section on running single-turn end-to-end evals with tracing to learn more.