Subscribe to our weekly newsletter to stay confident in the AI systems you build.
In this article, I'm going to go through all the top LLM benchmarks currently used and why they matter.
In this article, we'll learn everything there is to LLM testing, including best practices and methods to test LLMs.
In this article, we'll walkthrough how to fine-tune and evaluate a LLaMA-2 model using Hugging Face and DeepEval
In this tutorial, we'll walkthrough how to setup a full testing suite for RAG applications using DeepEval.

In this article, I'll walkthrough everything you need to know about LLM evaluation metrics, with code samples.

In this article, I'll show how benchmarking can help you choose the right LLM for your use case.
In this article, I'll teach you how to create your own text summarization metric.
In this article, I'll share how JudgmentalGPT, our in-house evaluator was built using OpenAI's Assistants.
In this interactive tutorial, I'll show you how to become a Midjournalist to create image you image.
In this article, we will debunk how to evaluate an LLM application / RAG pipelines the right way.