Test Cases

Creating test cases in your traces to run evaluations on-the-fly

Overview

Confident AI allows you to run evaluations on your spans and traces, which requires you to set test case parameters in update_current_span and update_current_trace.

Each metric requires different test case parameters. For detailed information on required test case parameters for each metric, refer to the official DeepEval documentation.

Test Case Parameters

Both update_current_span and update_current_trace accept 7 OPTIONAL test case parameters:

  • input: The input to your LLM app
  • output: The output of your LLM app
  • expected_output: The expected output of your LLM app
  • retrieval_context: A list of strings representing the retrieved text chunks from a retrieval system
  • context: A list of strings representing the ideal retrieved text chunks provided from a retrieval system
  • tools_called: A list of ToolCall objects representing the tools called by your LLM app
  • expected_tools: A list of ToolCall objects representing the expected tools to be called by the LLM app

The input and output accept Any type for visualization purposes, but we recommend setting them as strings for running evaluations.

Set Span Test Case Parameters

You can set span-level test case parameters in the update_current_span function:

main.py
1from deepeval.tracing import observe, update_current_span
2from deepeval.test_case import ToolCall
3
4@observe()
5def tool_calling_agent(query: str):
6 update_current_span(
7 input=query,
8 output="Agent response",
9 tools_called=[ToolCall(name="web_search", input_parameters={"query": query})],
10 )
11 return "Agent response"
12
13tool_calling_agent("What is weather in San Francisco?")

Set Trace Test Case Parameters

You can set trace-level test case parameters in the update_current_trace function.

update_current_trace can be set multiple times at any point in your code under the observe decorator, which is useful when a parameter is only accessible in specific parts of your code.

main.py
1from openai import OpenAI
2from deepeval.tracing import observe, update_current_trace
3
4client = OpenAI()
5
6@observe()
7def retriever(query: str):
8 retrieved_chunks = ["chunk1", "chunk2"]
9 update_current_trace(retrieval_context=retrieved_chunks)
10 return "\n".join(retrieved_chunks)
11
12@observe()
13def llm_app(query: str):
14 retrieval_context = retriever(query)
15 res = client.chat.completions.create(
16 model="gpt-4o",
17 messages=[{"role": "user", "content": query + retrieval_context}]
18 ).choices[0].message.content
19 update_current_trace(input=query, output=res)
20 return res
21
22llm_app("What is weather typically like in San Francisco?")