Stay Confident

Subscribe to our weekly newsletter to stay confident in the AI systems you build.

Thank you! You're now subscribed to Confident AI's weekly newsletter.

Oops! Something went wrong while submitting the form.

All Stories

Red-Teaming LLMs: The Complete Step-By-Step Guide

Red-Teaming LLMs: The Complete Step-By-Step Guide

In this article, you'll learn about LLM red teaming and how it can be carried out using DeepEval.

Kritin Vongthongsri

Evaluating LLM Systems: Essential Metrics, Benchmarks, and Best Practices

Evaluating LLM Systems: Essential Metrics, Benchmarks, and Best Practices

In this article, you'll learn how to evaluate LLM systems using LLM evaluation metrics and benchmark datasets.

Using LLMs for Synthetic Data Generation: The Definitive Guide

Using LLMs for Synthetic Data Generation: The Definitive Guide

In this article, I'm show you everything you need on how to generate realistic synthetic datasets using LLMs.

Kritin Vongthongsri

How to Build an LLM Evaluation Framework, from Scratch

How to Build an LLM Evaluation Framework, from Scratch

In this article, you're going to learn how to build the world's most robust and scalable LLM evaluation framework.

LLM Benchmarks: Everything on MMLU, HellaSwag, BBH, and Beyond

LLM Benchmarks: Everything on MMLU, HellaSwag, BBH, and Beyond

In this article, I'm going to go through all the top LLM benchmarks currently used and why they matter.

Kritin Vongthongsri

LLM Testing in 2024: Top Methods and Strategies

LLM Testing in 2024: Top Methods and Strategies

In this article, we'll learn everything there is to LLM testing, including best practices and methods to test LLMs.

The Ultimate Guide to Fine-Tune LLaMA 3, With LLM Evaluations

The Ultimate Guide to Fine-Tune LLaMA 3, With LLM Evaluations

In this article, we'll walkthrough how to fine-tune and evaluate a LLaMA-2 model using Hugging Face and DeepEval

RAG Evaluation: The Definitive Guide to Unit Testing RAG in CI/CD

RAG Evaluation: The Definitive Guide to Unit Testing RAG in CI/CD

In this tutorial, we'll walkthrough how to setup a full testing suite for RAG applications using DeepEval.

LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide

LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide

In this article, I'll walkthrough everything you need to know about LLM evaluation metrics, with code samples.

An Introduction to LLM Benchmarking

An Introduction to LLM Benchmarking

In this article, I'll show how benchmarking can help you choose the right LLM for your use case.

A Step-By-Step Guide to Evaluating an LLM Text Summarization Task

A Step-By-Step Guide to Evaluating an LLM Text Summarization Task

In this article, I'll teach you how to create your own text summarization metric.

Why OpenAI Assistants is a Big Win for LLM Evaluation

Why OpenAI Assistants is a Big Win for LLM Evaluation

In this article, I'll share how JudgmentalGPT, our in-house evaluator was built using OpenAI's Assistants.

Become a Prompt Artist: Understanding the Midjourney LLM

Become a Prompt Artist: Understanding the Midjourney LLM

In this interactive tutorial, I'll show you how to become a Midjournalist to create image you image.

How to Evaluate LLM Applications: The Complete Guide

How to Evaluate LLM Applications: The Complete Guide

In this article, we will debunk how to evaluate an LLM application / RAG pipelines the right way.

Why we replaced Pinecone with PGVector

Why we replaced Pinecone with PGVector

Do you really need a dedicated vector database for your Generative AI application? Our experience says not always.

What is Retrieval Augmented Generation (RAG)?

What is Retrieval Augmented Generation (RAG)?

In this article, we're going to dive deep into the RAG rabbit hole.

A Gentle Introduction to LLM Evaluation

A Gentle Introduction to LLM Evaluation

In this article, we'll introduce the ways in which you can carry out automated, LLM evaluation.

How to build a PDF QA chatbot using OpenAI and ChromaDB

How to build a PDF QA chatbot using OpenAI and ChromaDB

In this article, you'll learn how to build a RAG based chatbot on your PDFs using OpenAI and ChromaDB

Building a customer support chatbot using GPT-3.5 and lLamaIndex

Building a customer support chatbot using GPT-3.5 and lLamaIndex

In this article, you'll learn how to create a customer support chatbot using GPT-3.5 and lLamaIndex.

Generating synthetic data with LLMs - Part 1

Generating synthetic data with LLMs - Part 1

LLMs make synthetic data easy to leverage, but how exactly can we make these generated data relevant and useful?

Latest articles

No items found.

The Ultimate Guide to Fine-Tune LLaMA 3, With LLM Evaluations

The Ultimate Guide to Fine-Tune LLaMA 3, With LLM Evaluations

RAG Evaluation: The Definitive Guide to Unit Testing RAG in CI/CD

RAG Evaluation: The Definitive Guide to Unit Testing RAG in CI/CD

LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide

LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide

An Introduction to LLM Benchmarking

An Introduction to LLM Benchmarking

A Step-By-Step Guide to Evaluating an LLM Text Summarization Task

A Step-By-Step Guide to Evaluating an LLM Text Summarization Task

Why OpenAI Assistants is a Big Win for LLM Evaluation

Why OpenAI Assistants is a Big Win for LLM Evaluation

Start using the data retrieval platform of the future.

A CRM Platform For Power Users - Dataplus X Webflow Template