Confident AI Blog - Resources to help teams stay confident in AI
SlackJust In: New Slack Community! Connect with AI engineers building with Confident AI, join now →

Stay Confident

Subscribe to our weekly newsletter to stay confident in the AI systems you build.

AI Agent Evaluation: Metrics, Traces, Human Review, and Workflows

AI Agent Evaluation: Metrics, Traces, Human Review, and Workflows

A practical guide to evaluating AI agents with LLM metrics and tracing—plus when human review matters, how it calibrates judges, and workflows that combine CI, sampling, and production signals.

Jeffrey Ip

Jeffrey Ip

Oct 7, 2025
.
20 min read
RAG Evaluation Metrics: Assessing Answer Relevancy, Faithfulness, Contextual Relevancy, And More

RAG Evaluation Metrics: Assessing Answer Relevancy, Faithfulness, Contextual Relevancy, And More

This article will go through everything you'll need for RAG evaluation, including metrics, and best practices.

Jeffrey Ip

Jeffrey Ip

Jun 3, 2025
.
9 min read
LLM Evals Framework That Predicts ROI: A Step-by-Step Guide

LLM Evals Framework That Predicts ROI: A Step-by-Step Guide

Most LLM evals fail because metrics don't predict ROI, build outcome-based evals that correlate with business KPIs.

Jeffrey Ip

Jeffrey Ip

May 2, 2025
.
16 min read
G-Eval Simply Explained: LLM-as-a-Judge for LLM Evaluation

G-Eval Simply Explained: LLM-as-a-Judge for LLM Evaluation

This article goes through everything on G-Eval for anyone to easily evaluate LLM apps on any task specific criteria.

Kritin Vongthongsri

Kritin Vongthongsri

Apr 30, 2025
.
14 min read
How I Built Deterministic LLM Evaluation Metrics for DeepEval

How I Built Deterministic LLM Evaluation Metrics for DeepEval

In this article, I'm sharing how I've built DeepEval's latest deterministic, LLM-powered, custom metric.

Jeffrey Ip

Jeffrey Ip

Feb 9, 2025
.
9 min read
LLM Guardrails for Data Leakage, Prompt Injection, and More

LLM Guardrails for Data Leakage, Prompt Injection, and More

In this article, you'll learn everything you need to know on LLM guardrails and how to use it for LLM security.

Jeffrey Ip

Jeffrey Ip

Jan 26, 2025
.
15 min read
How to Jailbreak LLMs One Step at a Time: Top Techniques and Strategies

How to Jailbreak LLMs One Step at a Time: Top Techniques and Strategies

In this article, I'll show you how to jailbreak your LLM application to detect it for vulnerabilities.

Kritin Vongthongsri

Kritin Vongthongsri

Oct 30, 2024
.
16 min read
Top LLM Chatbot Evaluation Metrics: Conversation Testing Techniques

Top LLM Chatbot Evaluation Metrics: Conversation Testing Techniques

In this article, you'll learn about LLM red teaming and how it can be carried out using DeepTeam.

Jeffrey Ip

Jeffrey Ip

Oct 5, 2024
.
10 min read
LLM-as-a-Judge Simply Explained: The Complete Guide to Run LLM Evals at Scale

LLM-as-a-Judge Simply Explained: The Complete Guide to Run LLM Evals at Scale

In this article, I'll debunk what LLM judges are and go through why they are the best for LLM evaluation.

Jeffrey Ip

Jeffrey Ip

Sep 1, 2024
.
13 min read
LLM Red Teaming: The Complete Step-By-Step Guide To LLM Safety

LLM Red Teaming: The Complete Step-By-Step Guide To LLM Safety

In this article, you'll learn about LLM red teaming and how it can be carried out using DeepTeam.

Kritin Vongthongsri

Kritin Vongthongsri

Jun 29, 2024
.
16 min read