AI Connections | Confident AI Docs

AI Connections let you run evaluations directly on the platform by connecting to your AI app via an HTTPS endpoint. Instead of writing code, you can trigger evaluations with a click of a button—Confident AI will call your endpoint with data from your goldens and parse the response.

Setting Up an AI Connection

To create an AI connection:

Navigate to Project Settings → AI Connections
Click New AI Connection
Give it a unique identifying name
Click Save

Your AI connection won’t be usable yet—you still need to configure the endpoint, payload, and at minimum the actual output key path.

Configuration Parameters

There are several parameters you’ll need to configure in order for your AI connection to work.

Name

Give your AI connection a unique name to identify it within your project.

AI App Endpoint

Your AI app must be accessible via an HTTPS endpoint that accepts POST requests and returns a JSON response containing the actual output of your AI app.

Payload

Configure the JSON payload that Confident AI sends to your endpoint. You can customize this to match your API’s expected structure using values from your goldens.

Available variables:

Variable	Description	Type
`golden.input`	The input from your golden	string
`golden.actual_output`	The actual output from your golden	string
`golden.expected_output`	The expected output from your golden	string
`golden.retrieval_context`	The retrieval context from your golden	string[]
`golden.context`	The context from your golden	string[]
`golden.expected_tools`	The expected tools from your golden	ToolCall[]
`golden.tools_called`	The tools called from your golden	ToolCall[]
`golden.additional_metadata`	Additional metadata from your golden	object
`conversationalGolden.turns`	Turn history for multi-turn evals	Turn[]
`conversationalGolden.context`	Context for conversational goldens	string[]
`conversationalGolden.scenario`	Scenario for conversational goldens	string
`conversationalGolden.expected_outcome`	Expected outcome for conversational goldens	string
`conversationalGolden.user_description`	User description for conversational goldens	string
`conversationalGolden.additional_metadata`	Additional metadata for conversational goldens	object
`prompts`	A dictionary of prompts	object
`testCaseId`	Unique identifier for linking traces to test cases	string

Example payload:

1 {
2   "input": golden.input,
3   "context": golden.context,
4   "conversationalContext": conversationalGolden.context,
5   "prompts": prompts,
6   "turns": conversationalGolden.turns
7 }

The custom payload feature lets you structure the request to match your existing API contract—no need to modify your AI app to accept a specific format.

Use golden.* variables for single-turn evaluations and conversationalGolden.* variables for multi-turn evaluations. See Prompts for details on how to use the prompts dictionary.

Headers

Add any headers required by your endpoint, such as API keys, authentication tokens, or content type specifications. These headers are sent with every request to your AI app.

Prompts

Associate prompt versions with your AI connection. When running evaluations, these prompts will be attributed to each test run, letting you trace results back to the prompts used.

The prompts variable in your payload is a dictionary where each key maps to an object containing alias and version:

1 {
2   "system": { "alias": "system-prompt", "version": "1.0.0" },
3   "assistant": { "alias": "assistant-prompt", "version": "2.1.0" }
4 }

Here’s an example of how your Python endpoint might handle the prompts dictionary:

1 from deepeval.prompt import Prompt
2 
3 @app.post("/generate")
4 def generate(request: dict):
5     # Pull different prompt versions using their keys
6     system_info = request["prompts"]["system"]
7     assistant_info = request["prompts"]["assistant"]
8 
9     system_prompt = Prompt(alias=system_info["alias"]).pull(version=system_info["version"])
10     assistant_prompt = Prompt(alias=assistant_info["alias"]).pull(version=assistant_info["version"])
11 
12     # Use the prompts in your generation
13     response = llm.generate(
14         system=system_prompt.text,
15         assistant=assistant_prompt.text,
16         user=request["input"]
17     )
18 
19     return {"output": response}

For more details on working with prompts, see Prompt Versioning.

Actual Output Key Path

A list of strings or integers representing the path to the actual_output value in your JSON response. Use strings for JSON keys and integers for array indices. This is required for evaluation to work.

For example, if your endpoint returns:

1 {
2   "response": {
3     "output": "Hello, world!"
4   }
5 }

Set the key path to ["response", "output"].

For nested arrays, use integers to specify the array index. For example, if your endpoint returns:

1 {
2   "response": {
3     "output": {
4       "content": [{ "text": "Hello, world!" }]
5     }
6   }
7 }

Set the key path to ["response", "output", "content", 0, "text"].

Key paths support both JSON keys (strings) and list indices (integers)

Retrieval Context Key Path

A list of strings or integers representing the path to the retrieval_context value in your JSON response. Use strings for JSON keys and integers for array indices. This is optional and only needed if you’re using RAG metrics. The value must be a list of strings.

For example, if your endpoint returns:

1 {
2   "response": {
3     ...
4     "retrieval_context": ["context1", "context2"]
5   }
6 }

Set the key path to ["response", "retrieval_context"].

Tool Call Key Path

A list of strings or integers representing the path to the tools_called value in your JSON response. Use strings for JSON keys and integers for array indices. This is optional and only needed if you’re using metrics that require a tool call parameter. The value must be a list of ToolCall.

For example, if your endpoint returns:

1 {
2   "response": {
3     ...
4     "tools_called": [
5       {
6         "name": "get_weather",
7         "description": "Get weather for a location",
8         "reasoning": "User asked about the weather in San Francisco",
9         "output": "Sunny, 72°F",
10         "inputParameters": {"location": "San Francisco"}
11       }
12     ]
13   }
14 }

Set the key path to ["response", "tools_called"].

For more information on the structure of a tool call, refer to the official DeepEval documentation.

Request Timeout

Set the maximum time (in seconds) that Confident AI will wait for your endpoint to respond before timing out. This helps prevent evaluations from hanging indefinitely if your AI connection is slow or unresponsive.

Minimum: 1 second
Default: 60 seconds

If your AI app performs complex operations or calls external services, you may need to increase the timeout to avoid premature failures.

Max Concurrency

Set the maximum number of concurrent requests that Confident AI will send to your endpoint at the same time. This helps prevent overwhelming your AI app during large evaluation runs.

Minimum: 1
Default: 20

Max Retries

Set the maximum number of times Confident AI will retry a failed request to your endpoint. This helps handle transient errors without failing the entire evaluation.

Minimum: 0
Default: 0

Linking Test Cases to Traces

When running evaluations through an AI Connection, you can link each test case to its corresponding trace for full observability. This is done by including testCaseId in your payload (enabled by default) and passing it to your tracing setup.

Include testCaseId in your payload configuration and ensure your AI connection is configured to accept it.

1 {
2   "input": golden.input,
3   "testCaseId": testCaseId
4 }

Then, pass the testCaseId to your tracing implementation:

Python

LangChain

LangGraph

1 from deepeval.tracing import observe, update_current_trace
2 
3 @observe()
4 def llm_app(input: str, test_case_id: str):
5     # Link the trace to the test case
6     update_current_trace(test_case_id=test_case_id)
7 
8     # Your generation logic here
9     response = llm.generate(user=input)
10     return response
11 
12 @app.post("/generate")
13 def generate(request: dict):
14     test_case_id = request,get("testCaseId")
15     response = llm_app(request["input"], test_case_id)
16     return {"output": response}

Once linked, you can view the full trace for each test case directly from the evaluation results, making it easy to debug failures and understand model behavior.

Testing Your Connection

After configuring your AI connection, click Ping Endpoint to verify everything is set up correctly. You should receive a 200 status response. If not, check the error message and adjust your configuration accordingly.