Confident AI Blog - Resources to help teams stay confident in AI
SlackJust In: New Slack Community! Connect with AI engineers building with Confident AI, join now →

Stay Confident

Subscribe to our weekly newsletter to stay confident in the AI systems you build.

LLM Arena-as-a-Judge: LLM-Evals for Comparison-Based Regression Testing

LLM Arena-as-a-Judge: LLM-Evals for Comparison-Based Regression Testing

In this article, you'll learn everything about running LLM Arena-as-a-judge as a novel way to regression test LLMs.

Jeffrey Ip

Jeffrey Ip

Jul 6, 2025
.
10 min read
RAG Evaluation Metrics: Assessing Answer Relevancy, Faithfulness, Contextual Relevancy, And More

RAG Evaluation Metrics: Assessing Answer Relevancy, Faithfulness, Contextual Relevancy, And More

This article will go through everything you'll need for RAG evaluation, including metrics, and best practices.

Jeffrey Ip

Jeffrey Ip

Jun 3, 2025
.
9 min read
LLM Evals Framework That Predicts ROI: A Step-by-Step Guide

LLM Evals Framework That Predicts ROI: A Step-by-Step Guide

Most LLM evals fail because metrics don't predict ROI, build outcome-based evals that correlate with business KPIs.

Jeffrey Ip

Jeffrey Ip

May 2, 2025
.
16 min read
G-Eval Simply Explained: LLM-as-a-Judge for LLM Evaluation

G-Eval Simply Explained: LLM-as-a-Judge for LLM Evaluation

This article goes through everything on G-Eval for anyone to easily evaluate LLM apps on any task specific criteria.

Kritin Vongthongsri

Kritin Vongthongsri

Apr 30, 2025
.
14 min read
Top LLM Evaluators for Testing LLM Systems at Scale

Top LLM Evaluators for Testing LLM Systems at Scale

In this article, we'll go through all the top LLM evaluators in 2025 including G-Eval and other LLM-as-a-judges.

Jeffrey Ip

Jeffrey Ip

Apr 21, 2025
.
15 min read
How I raised Confident AI's $2.2M seed round in 5 days

How I raised Confident AI's $2.2M seed round in 5 days

Announcing Confident AI's seed round, with participation from a bunch of great investors.

Jeffrey Ip

Jeffrey Ip

Mar 19, 2025
.
8 min read
How I Built Deterministic LLM Evaluation Metrics for DeepEval

How I Built Deterministic LLM Evaluation Metrics for DeepEval

In this article, I'm sharing how I've built DeepEval's latest deterministic, LLM-powered, custom metric.

Jeffrey Ip

Jeffrey Ip

Feb 9, 2025
.
9 min read
LLM Agent Evaluation: Assessing Tool Use, Task Completion, Agentic Reasoning, and More

LLM Agent Evaluation: Assessing Tool Use, Task Completion, Agentic Reasoning, and More

In this article, I'll share the principles of LLM agent evaluation and you how to do it using DeepEval.

Kritin Vongthongsri

Kritin Vongthongsri

Jan 27, 2025
.
14 min read
LLM Guardrails for Data Leakage, Prompt Injection, and More

LLM Guardrails for Data Leakage, Prompt Injection, and More

In this article, you'll learn everything you need to know on LLM guardrails and how to use it for LLM security.

Jeffrey Ip

Jeffrey Ip

Jan 26, 2025
.
15 min read
OWASP Top 10 2025 for LLM Applications: What’s new? Risks, and Mitigation Techniques

OWASP Top 10 2025 for LLM Applications: What’s new? Risks, and Mitigation Techniques

In this article, we'll go through what is OWASP Top 10, as well as what's new in their latest 2025 guidelines.

Kritin Vongthongsri

Kritin Vongthongsri

Jan 18, 2025
.
14 min read