Model and Prompt Insights

Learn how to find the best models, prompts, etc. for your LLM app

Overview

When running evaluations, it’s important to log hyperparameters to track which configurations were used for each test run. This allows you to compare different versions of prompts, models, and other settings to make data-driven decisions about your LLM application.

This is currently only supported for end-to-end evaluation.

Prompt and Model Insights

Prompts

Prompts are a type of hyperparameter, and in Confident AI, you can include them as hyperparameters to test each prompt’s performance when running an evaluation:

main.py
1from deepeval.prompt import Prompt
2from deepeval.metrics import AnswerRelevancyMetric
3from deepeval.test_case import LLMTestCase
4from deepeval import evaluate
5
6# Pull your prompt version
7prompt = Prompt(alias="your-prompt-alias")
8prompt.pull()
9
10# Run an evaluation with the prompt as a hyperparameter
11evaluate(
12 test_cases=[LLMTestCase(input="...", actual_output="...")],
13 metrics=[AnswerRelevancyMetric()],
14 hyperparameters={"System Prompt": prompt}
15)

You should NEVER log the interpolated version of your prompt template. This will create a unique version for each variable substitution, making it impossible to meaningfully compare prompt performance. Always log the Prompt instance itself.

Models and Others

You can also log model and other parameter information as hyperparameters to track which model versions or configurations were used for a test run:

main.py
1evaluate(
2 test_cases=dataset.test_cases,
3 metrics=[AnswerRelevancyMetric()],
4 hyperparameters={
5 "Model": "gpt-4",
6 }
7)

In CI/CD

For python users of deepeval, you can log parameteres in CI/CD using the @deepeval.log_hyperparameters() decorator on top of a function that returns a dictionary of string and prompts:

test_llm_app.py
1from deepeval.prompt import Prompt
2import deepeval
3
4prompt = Prompt(alias="your-prompt-alias")
5prompt.pull()
6
7@deepeval.log_hyperparameters()
8def hyperparameters():
9 return {"System Prompt": prompt, "Model": "YOUR-MODEL-NAME"}

Now everytime when you run deepeval test run, every test file will be ran as unit-tests while logging parameters as part of your test run. Go to the next section to learn more about unit-testing in CI/CD.