Confident AI Blog - Resources to help teams stay confident in AI
SlackJust In: New Slack Community! Connect with AI engineers building with Confident AI, join now →

Stay Confident

Subscribe to our weekly newsletter to stay confident in the AI systems you build.

How I raised Confident AI's $2.2M seed round in 5 days

How I raised Confident AI's $2.2M seed round in 5 days

Announcing Confident AI's seed round, with participation from a bunch of great investors.

Jeffrey Ip

Jeffrey Ip

Mar 19, 2025
.
8 min read
How I Built Deterministic LLM Evaluation Metrics for DeepEval

How I Built Deterministic LLM Evaluation Metrics for DeepEval

In this article, I'm sharing how I've built DeepEval's latest deterministic, LLM-powered, custom metric.

Jeffrey Ip

Jeffrey Ip

Feb 9, 2025
.
9 min read
LLM Agent Evaluation: Assessing Tool Use, Task Completion, Agentic Reasoning, and More

LLM Agent Evaluation: Assessing Tool Use, Task Completion, Agentic Reasoning, and More

In this article, I'll share the principles of LLM agent evaluation and you how to do it using DeepEval.

Kritin Vongthongsri

Kritin Vongthongsri

Jan 27, 2025
.
14 min read
The People's Choice of Top LLM Evaluation Tools in 2025

The People's Choice of Top LLM Evaluation Tools in 2025

In this article, we'll bring you a hand-picked, carefully curated list of top LLM evaluation tools in the market.

Jeffrey Ip

Jeffrey Ip

Jan 15, 2025
.
6 min read
What is LLM Observability? - The Ultimate LLM Observability Guide

What is LLM Observability? - The Ultimate LLM Observability Guide

In this article, I'll share what you should definitely look for in your next LLM Observability solution.

Kritin Vongthongsri

Kritin Vongthongsri

Oct 29, 2024
.
9 min read
Top LLM Chatbot Evaluation Metrics: Conversation Testing Techniques

Top LLM Chatbot Evaluation Metrics: Conversation Testing Techniques

In this article, you'll learn about LLM red teaming and how it can be carried out using DeepTeam.

Jeffrey Ip

Jeffrey Ip

Oct 5, 2024
.
10 min read
LLM-as-a-Judge Simply Explained: The Complete Guide to Run LLM Evals at Scale

LLM-as-a-Judge Simply Explained: The Complete Guide to Run LLM Evals at Scale

In this article, I'll debunk what LLM judges are and go through why they are the best for LLM evaluation.

Jeffrey Ip

Jeffrey Ip

Sep 1, 2024
.
13 min read
Evaluating LLM Systems: Essential Metrics, Benchmarks, and Best Practices

Evaluating LLM Systems: Essential Metrics, Benchmarks, and Best Practices

In this article, you'll learn how to evaluate LLM systems using LLM evaluation metrics and benchmark datasets.

Jeffrey Ip

Jeffrey Ip

Jun 24, 2024
.
16 min read
Using LLMs for Synthetic Data Generation: The Definitive Guide

Using LLMs for Synthetic Data Generation: The Definitive Guide

In this article, I'm show you everything you need on how to generate realistic synthetic datasets using LLMs.

Kritin Vongthongsri

Kritin Vongthongsri

May 9, 2024
.
12 min read
How to Build an LLM Evaluation Framework, from Scratch

How to Build an LLM Evaluation Framework, from Scratch

In this article, you're going to learn how to build the world's most robust and scalable LLM evaluation framework.

Jeffrey Ip

Jeffrey Ip

Apr 5, 2024
.
9 min read