To those new to DeepEval, DeepEval provides a Pythonic way to run offline evaluations on your LLM pipelines so you can launch comfortably into production. It provides a testing suite for LLMs.
In this product update, we include a number of improvements such as:
For Retrieval Augmented Generation applications for tools like LlamaIndex, developers want an easy way to quickly measure the performance of their RAG pipeline.
This is now achievable in just 1 line of code.
Under the hood, it uses ChatGPT to automatically create n number of query-answer pairs. It uses a simple ChatGPT prompt, takes in the original contextand feeds it into a LLMTestCase . The LLMTestCase abstraction is one of the building blocks of DeepEval that allows for measuring performance of these RAG pipelines.
Interested in finding out more? Read about how to run this here.
Once you have created synthetic data, you can easily add / remove synthetic data pieces. You can see a sample screenshot of the dashboard for reviewing synthetic data.
The best part? You can view the dashboard completely in Python and can be self-hosted. This is done simply by running:
When reviewing the dataset, you will be able to easily delete a row and add a row depending on what data you think is important for your evaluation.
Custom metric logging has been one of the most ❤️ features. Now, you can do it with DeepEval and save it right onto the Confident AI dashboard.
You can define a custom metric in just a few lines of code:
🧠 Developer Experience Improvements
As we are building this, we added a new LLMTestCase abstraction designed to provide flexibility when running these tests. We recommend reading more about it here for those looking to dive into the framework.
In addition, we also added 2 new ways to run tests:
These Pythonic abstractions are intended to make it easier to be able to run tests and log them to the server (if the API key is set) and to treat as independent Pytests.
DeepEval is iterating fast and adding a number of metrics with an ambitious roadmap including adding guardrails, improving synthetic data creation and significant improvements to our dashboard.
Subscribe to our weekly newsletter to stay confident in the AI systems you build.