Model and Prompt Insights
Overview
When running evaluations, it’s important to log hyperparameters to track which configurations were used for each test run. This allows you to compare different versions of prompts, models, and other settings to make data-driven decisions about your LLM application.
This is currently only supported for end-to-end evaluation.
Prompts
Prompts are a type of hyperparameter, and in Confident AI, you can include them as hyperparameters to test each prompt’s performance when running an evaluation:
Python
Typescript
curL
You should NEVER log the interpolated version of your prompt template.
This will create a unique version for each variable substitution, making it
impossible to meaningfully compare prompt performance. Always log the Prompt
instance itself.
Models and Others
You can also log model and other parameter information as hyperparameters to track which model versions or configurations were used for a test run:
Python
Typescript
curL
In CI/CD
For python users of deepeval, you can log parameteres in CI/CD using the @deepeval.log_hyperparameters() decorator on top of a function that returns a dictionary of string and prompts:
Now everytime when you run deepeval test run, every test file will be ran as unit-tests while logging parameters as part of your test run. Go to the next section to learn more about unit-testing in CI/CD.