Introduction

Generate synthetic goldens from your own data sources

Overview

Synthetic data generation allows you to automatically create high-quality goldens from your existing data sources — documents stored in Google Drive, messages in Slack channels, pages in Notion, or files in SharePoint. Instead of manually writing goldens one by one, you can connect a data source and let Confident AI generate evaluation-ready goldens at scale.

How It Works

At a high level, synthetic data generation follows three steps:

1

Connect a Data Source

Navigate to Project Settings > Data Sources and connect an external source such as Google Drive, Slack, Notion, or SharePoint. Each source type requires its own set of credentials.

2

Create a Generation Config

Under Datasets > Automations, create a generation configuration that points to your data source. You can control parameters like the maximum number of goldens generated per context chunk.

3

Generate

Click Generate and Confident AI will pull documents from your data source, chunk them into contexts, and use an LLM to synthesize goldens — complete with inputs, expected outputs, and context fields.

Supported Data Sources

Next Steps