AI Connections

Connect your AI app to run evaluations directly on the platform without code.

AI Connections let you run evaluations directly on the platform by connecting to your AI app via an HTTPS endpoint. Instead of writing code, you can trigger evaluations with a click of a button—Confident AI will call your endpoint with data from your goldens and parse the response.

Setup AI Connection

Setting Up an AI Connection

To create an AI connection:

  1. Navigate to Project SettingsAI Connections
  2. Click New AI Connection
  3. Give it a unique identifying name
  4. Click Save

Your AI connection won’t be usable yet—you still need to configure the endpoint, payload, and at minimum the actual output key path.

Configuration Parameters

There are several parameters you’ll need to configure in order for your AI connection to work.

Name

Give your AI connection a unique name to identify it within your project.

AI App Endpoint

Your AI app must be accessible via an HTTPS endpoint that accepts POST requests and returns a JSON response containing the actual output of your AI app.

Payload

Configure the payload that gets sent to your endpoint when Confident AI calls it. JSON mode lets you map available variables into a JSON structure, while the Code editor lets you write a Python function for conditional logic, data transformation, or full programmatic control over the request body.

JSON mode lets you define a payload using available variables. You can nest values to match your endpoint’s expected structure.

Available variables:

VariableDescriptionType
golden.inputThe input from your goldenstring
golden.actual_outputThe actual output from your goldenstring
golden.expected_outputThe expected output from your goldenstring
golden.retrieval_contextThe retrieval context from your goldenstring[]
golden.contextThe context from your goldenstring[]
golden.expected_toolsThe expected tools from your goldenToolCall[]
golden.tools_calledThe tools called from your goldenToolCall[]
golden.additional_metadataAdditional metadata from your goldenobject
conversationalGolden.turnsTurn history for multi-turn evalsTurn[]
conversationalGolden.contextContext for conversational goldensstring[]
conversationalGolden.scenarioScenario for conversational goldensstring
conversationalGolden.expected_outcomeExpected outcome for conversational goldensstring
conversationalGolden.user_descriptionUser description for conversational goldensstring
conversationalGolden.additional_metadataAdditional metadata for conversational goldensobject
promptsA dictionary of promptsobject
hyperparametersA dictionary of hyperparameter key-value pairsobject
testCaseIdUnique identifier for linking traces to test casesstring
turnIdUnique identifier for linking traces to turnsstring
stateAn object to keep state for multi-turn simulationsobject

Use golden.* variables for single-turn evaluations and conversationalGolden.* variables for multi-turn evaluations. See Prompts for details on how to use the prompts dictionary, and Hyperparameters for passing hyperparameters to your endpoint.

Example payload:

1{
2 "input": golden.input,
3 "context": golden.context,
4 "conversationalContext": conversationalGolden.context,
5 "prompts": prompts,
6 "hyperparameters": hyperparameters,
7 "turns": conversationalGolden.turns
8}

The custom payload feature lets you structure the request to match your existing API contract—no need to modify your AI app to accept a specific format.

Headers

Add any custom headers required by your endpoint as key-value pairs, such as API keys or content type specifications. These headers are sent with every request to your AI app.

Authorization

Configure authentication for requests to your AI app endpoint. The Authorization tab has two sections: Secrets Manager and Authentication.

Secrets Manager

A secrets manager lets you securely retrieve authentication credentials at runtime from a cloud vault, instead of storing them directly on the platform.

To enable a secrets manager:

  1. Toggle the secrets manager on
  2. Select a provider (e.g. Azure Key Vault)
  3. Enter your Vault URL (e.g., https://your-vault.vault.azure.net)
  4. Enter your Tenant ID, Client ID, and Client Secret to authenticate to the vault

For self-hosted deployments, the secrets manager is always enabled and uses managed identities for authentication, so no secrets provider credentials are required.

Authentication

Select an authentication type from the dropdown:

TypeDescription
NoneNo authentication is applied
Auth0Exchanges client credentials for a Bearer token via Auth0’s OAuth2 client credentials flow
HMACComputes an HMAC-SHA256 signature of the request payload and sends it as a header

Auth0 requires the following fields:

FieldDescription
Auth0 DomainYour Auth0 tenant domain (e.g., your-tenant.auth0.com)
AudienceThe API identifier this token is authorized to access
Client ID / Client ID NameYour Auth0 application client ID, or the name of the secret in your vault if using a secrets manager
Client Secret / Client Secret NameYour Auth0 application client secret, or the name of the secret in your vault if using a secrets manager

HMAC requires the following fields:

FieldDescription
Header KeyThe HTTP header name where the signature is sent (e.g., X-Signature)
Signature PrefixAn optional prefix prepended to the signature (e.g., sha256=)
Secret Key / Secret NameThe signing key, or the name of the secret in your vault if using a secrets manager

You can use a secrets manager with Auth0 to store your client credentials in a key vault. Instead of entering the actual Client ID and Client Secret, provide the names of the secrets in your vault and they will be retrieved at runtime.

Prompts

Associate prompt versions with your AI connection. When running evaluations, these prompts will be attributed to each test run, letting you trace results back to the prompts used.

The prompts variable in your payload is a dictionary where each key maps to an object containing alias and version:

1{
2 "system": { "alias": "system-prompt", "version": "1.0.0" },
3 "assistant": { "alias": "assistant-prompt", "version": "2.1.0" }
4}

Here’s an example of how your Python endpoint might handle the prompts dictionary:

1from deepeval.prompt import Prompt
2
3@app.post("/generate")
4def generate(request: dict):
5 # Pull different prompt versions using their keys
6 system_info = request["prompts"]["system"]
7 assistant_info = request["prompts"]["assistant"]
8
9 system_prompt = Prompt(alias=system_info["alias"]).pull(version=system_info["version"])
10 assistant_prompt = Prompt(alias=assistant_info["alias"]).pull(version=assistant_info["version"])
11
12 # Use the prompts in your generation
13 response = llm.generate(
14 system=system_prompt.text,
15 assistant=assistant_prompt.text,
16 user=request["input"]
17 )
18
19 return {"output": response}

For more details on working with prompts, see Prompt Versioning.

Hyperparameters

Define optional hyperparameters as string key-value pairs. These are sent to your endpoint as part of the payload and are also logged in test runs and experiments, making it easy to track which configuration was used for each evaluation.

Hyperparameters are useful for passing model configuration values like temperature, model_name, or max_tokens to your AI app without hardcoding them into your endpoint. Since they’re logged alongside test run and experiment results, you can compare how different hyperparameter values affect evaluation outcomes.

The hyperparameters variable in your payload is a dictionary where both keys and values are strings:

1{
2 "temperature": "0.7",
3 "model": "gpt-4o",
4 "max_tokens": "1024"
5}

Here’s an example of how your Python endpoint might use hyperparameters:

1@app.post("/generate")
2def generate(request: dict):
3 hyperparameters = request.get("hyperparameters", {})
4
5 response = llm.generate(
6 model=hyperparameters.get("model", "gpt-4o"),
7 temperature=float(hyperparameters.get("temperature", "0.7")),
8 max_tokens=int(hyperparameters.get("max_tokens", "1024")),
9 user=request["input"]
10 )
11
12 return {"output": response}

Hyperparameter values are always strings. Cast them to the appropriate type (e.g., float, int) in your endpoint as needed.

Actual Output Key Path

A list of strings or integers representing the path to the actual_output value in your JSON response. Use strings for JSON keys and integers for array indices. This is required for evaluation to work.

For example, if your endpoint returns:

1{
2 "response": {
3 "output": "Hello, world!"
4 }
5}

Set the key path to ["response", "output"].

For nested arrays, use integers to specify the array index. For example, if your endpoint returns:

1{
2 "response": {
3 "output": {
4 "content": [{ "text": "Hello, world!" }]
5 }
6 }
7}

Set the key path to ["response", "output", "content", 0, "text"].

Key paths support both JSON keys (strings) and list indices (integers)

Retrieval Context Key Path

A list of strings or integers representing the path to the retrieval_context value in your JSON response. Use strings for JSON keys and integers for array indices. This is optional and only needed if you’re using RAG metrics. The value must be a list of strings.

For example, if your endpoint returns:

1{
2 "response": {
3 ...
4 "retrieval_context": ["context1", "context2"]
5 }
6}

Set the key path to ["response", "retrieval_context"].

Tool Call Key Path

A list of strings or integers representing the path to the tools_called value in your JSON response. Use strings for JSON keys and integers for array indices. This is optional and only needed if you’re using metrics that require a tool call parameter. The value must be a list of ToolCall.

For example, if your endpoint returns:

1{
2 "response": {
3 ...
4 "tools_called": [
5 {
6 "name": "get_weather",
7 "description": "Get weather for a location",
8 "reasoning": "User asked about the weather in San Francisco",
9 "output": "Sunny, 72°F",
10 "inputParameters": {"location": "San Francisco"}
11 }
12 ]
13 }
14}

Set the key path to ["response", "tools_called"].

For more information on the structure of a tool call, refer to the official DeepEval documentation.

Request Timeout

Set the maximum time (in seconds) that Confident AI will wait for your endpoint to respond before timing out. This helps prevent evaluations from hanging indefinitely if your AI connection is slow or unresponsive.

  • Minimum: 1 second
  • Default: 60 seconds

If your AI app performs complex operations or calls external services, you may need to increase the timeout to avoid premature failures.

Max Concurrency

Set the maximum number of concurrent requests that Confident AI will send to your endpoint at the same time. This helps prevent overwhelming your AI app during large evaluation runs.

  • Minimum: 1
  • Default: 20

Max Retries

Set the maximum number of times Confident AI will retry a failed request to your endpoint. This helps handle transient errors without failing the entire evaluation.

  • Minimum: 0
  • Default: 0

Configure Multiturn State

During multi-turn simulations, Confident AI calls your endpoint once per turn. You can use state to persist information—like a thread ID or session—across turns so your AI app can maintain context throughout the conversation.

On the first turn, the state variable in your payload will be empty since no prior state exists. If your endpoint returns a state object and the state key path successfully extracts it, that state will be included in the state payload variable from the second turn onwards.

Payload

To enable multiturn state, include state in your payload configuration so it gets sent to your endpoint on each turn:

1{
2 "input": golden.input,
3 "state": state
4}

Here’s an example of how your endpoint might handle state:

1@app.post("/generate")
2def generate(request: dict):
3 state = request.get("state", {})
4
5 if not state:
6 thread_id = create_new_thread()
7 else:
8 thread_id = state["threadId"]
9
10 response = llm.generate(
11 thread_id=thread_id,
12 user=request["input"]
13 )
14
15 return {
16 "output": response,
17 "state": {"threadId": thread_id}
18 }

State Key Path

The state key path works just like the actual output key path—a list of strings or integers representing the path to the state object in your JSON response. This tells Confident AI where to extract state from your endpoint’s response so it can be passed back on the next turn.

For example, if your endpoint returns:

1{
2 "output": "Hello! How can I help?",
3 "state": {
4 "threadId": "abc-123"
5 }
6}

Set the state key path to ["state"].

State is only relevant for multi-turn evaluations (simulations). For single-turn evaluations, you can ignore this setting entirely.

Linking Test Cases to Traces

For single-turn evaluations, you can link each test case to its corresponding trace for full observability. This is done by including testCaseId in your payload (enabled by default) and passing it to your tracing setup.

Include testCaseId in your payload configuration and ensure your AI connection is configured to accept it.

1{
2 "input": golden.input,
3 "testCaseId": testCaseId
4}

Then, pass the testCaseId to your tracing implementation:

1from deepeval.tracing import observe, update_current_trace
2
3@observe()
4def llm_app(input: str, test_case_id: str):
5 update_current_trace(test_case_id=test_case_id)
6
7 response = llm.generate(user=input)
8 return response
9
10@app.post("/generate")
11def generate(request: dict):
12 test_case_id = request.get("testCaseId")
13 response = llm_app(request["input"], test_case_id)
14 return {"output": response}

Once linked, you can view the full trace for each test case directly from the evaluation results, making it easy to debug failures and understand model behavior.

Linking Turns to Traces

For multi-turn evaluations, Confident AI calls your endpoint once per turn. Each turn has its own turnId that you can pass to your tracing setup. This links each turn’s trace to the specific turn in the conversation, letting you view traces per-turn from the evaluation results.

Include turnId in your payload configuration:

1{
2 "input": golden.input,
3 "turnId": turnId,
4 "state": state
5}

Then, pass the turnId to your tracing implementation:

1from deepeval.tracing import observe, update_current_trace
2
3@observe()
4def llm_app(input: str, turn_id: str):
5 update_current_trace(turn_id=turn_id)
6
7 response = llm.generate(user=input)
8 return response
9
10@app.post("/generate")
11def generate(request: dict):
12 turn_id = request.get("turnId")
13 response = llm_app(request["input"], turn_id)
14 return {"output": response}

With turnId linked, you can click “Open trace” on any individual turn in the evaluation results to see the full trace for that specific turn.

Testing Your Connection

After configuring your AI connection, click Ping Endpoint to verify everything is set up correctly. You should receive a 200 status response. If not, check the error message and adjust your configuration accordingly.