Stay Confident

Subscribe to our weekly newsletter to stay confident in the AI systems you build.

All Stories Featured Evaluation Safety Product

LLM Arena-as-a-Judge: LLM-Evals for Comparison-Based Regression Testing

In this article, you'll learn everything about running LLM Arena-as-a-judge as a novel way to regression test LLMs.

Jeffrey Ip

Jul 6, 2025

10 min read

RAG Evaluation Metrics: Assessing Answer Relevancy, Faithfulness, Contextual Relevancy, And More

This article will go through everything you'll need for RAG evaluation, including metrics, and best practices.

Jeffrey Ip

Jun 3, 2025

9 min read

LLM Evals Framework That Predicts ROI: A Step-by-Step Guide

Most LLM evals fail because metrics don't predict ROI, build outcome-based evals that correlate with business KPIs.

Jeffrey Ip

May 2, 2025

16 min read

G-Eval Simply Explained: LLM-as-a-Judge for LLM Evaluation

This article goes through everything on G-Eval for anyone to easily evaluate LLM apps on any task specific criteria.

Kritin Vongthongsri

Apr 30, 2025

14 min read

Top LLM Evaluators for Testing LLM Systems at Scale

In this article, we'll go through all the top LLM evaluators in 2025 including G-Eval and other LLM-as-a-judges.

Jeffrey Ip

Apr 21, 2025

15 min read

How I raised Confident AI's $2.2M seed round in 5 days

Announcing Confident AI's seed round, with participation from a bunch of great investors.

Jeffrey Ip

Mar 19, 2025

8 min read

How I Built Deterministic LLM Evaluation Metrics for DeepEval

In this article, I'm sharing how I've built DeepEval's latest deterministic, LLM-powered, custom metric.

Jeffrey Ip

Feb 9, 2025

9 min read

LLM Agent Evaluation: Assessing Tool Use, Task Completion, Agentic Reasoning, and More

In this article, I'll share the principles of LLM agent evaluation and you how to do it using DeepEval.

Kritin Vongthongsri

Jan 27, 2025

14 min read

LLM Guardrails for Data Leakage, Prompt Injection, and More

In this article, you'll learn everything you need to know on LLM guardrails and how to use it for LLM security.

Jeffrey Ip

Jan 26, 2025

15 min read

OWASP Top 10 2025 for LLM Applications: What’s new? Risks, and Mitigation Techniques

In this article, we'll go through what is OWASP Top 10, as well as what's new in their latest 2025 guidelines.

Kritin Vongthongsri

Jan 18, 2025

14 min read

Back Next