Test Cases | Confident AI Docs

Overview

Confident AI allows you to run evaluations on your spans and traces, which requires you to set test case parameters in update_current_span and update_current_trace.

Each metric requires different test case parameters. For detailed information on required test case parameters for each metric, refer to the official DeepEval documentation.

Test Case Parameters

Python

TypeScript

Both update_current_span and update_current_trace accept 7 OPTIONAL test case parameters:

input: The input to your LLM app
output: The output of your LLM app
expected_output: The expected output of your LLM app
retrieval_context: A list of strings representing the retrieved text chunks from a retrieval system
context: A list of strings representing the ideal retrieved text chunks provided from a retrieval system
tools_called: A list of ToolCall objects representing the tools called by your LLM app
expected_tools: A list of ToolCall objects representing the expected tools to be called by the LLM app

The input and output accept Any type for visualization purposes, but we recommend setting them as strings for running evaluations.

Set Span Test Case Parameters

You can set span-level test case parameters in the update_current_span function:

Python

TypeScript

main.py

1 from deepeval.tracing import observe, update_current_span
2 from deepeval.test_case import ToolCall
3 
4 @observe()
5 def tool_calling_agent(query: str):
6     update_current_span(
7         input=query,
8         output="Agent response",
9         tools_called=[ToolCall(name="web_search", input_parameters={"query": query})],
10     )
11     return "Agent response"
12 
13 tool_calling_agent("What is weather in San Francisco?")

Set Trace Test Case Parameters

You can set trace-level test case parameters in the update_current_trace function.

update_current_trace can be set multiple times at any point in your code under the observe decorator, which is useful when a parameter is only accessible in specific parts of your code.

Python

TypeScript

main.py

1 from openai import OpenAI
2 from deepeval.tracing import observe, update_current_trace
3 
4 client = OpenAI()
5 
6 @observe()
7 def retriever(query: str):
8     retrieved_chunks = ["chunk1", "chunk2"]
9     update_current_trace(retrieval_context=retrieved_chunks)
10     return "\n".join(retrieved_chunks)
11 
12 @observe()
13 def llm_app(query: str):
14     retrieval_context = retriever(query)
15     res = client.chat.completions.create(
16         model="gpt-4o",
17         messages=[{"role": "user", "content": query + retrieval_context}]
18     ).choices[0].message.content
19     update_current_trace(input=query, output=res)
20     return res
21 
22 llm_app("What is weather typically like in San Francisco?")