Confident AI
Blog
Github
Documentation
Pricing
Pricing
Blog
Documentation
Github
Book a demo
Login
Stay Confident
Subscribe to our weekly newsletter to stay confident in the AI systems you build.
Thank you! You're now subscribed to Confident AI's weekly newsletter.
Oops! Something went wrong while submitting the form.
All Stories
LLM Chatbot Evaluation Explained: Top Metrics and Testing Techniques
In this article, I'll share how to evaluate LLM chatbots using the latest LLM conversational metrics.
Jeffrey Ip
Leveraging LLM-as-a-Judge for Automated and Scalable Evaluation
In this article, I'll debunk what LLM judges are and go through why they are the best for LLM evaluation.
Jeffrey Ip
Top LLM Security Vulnerabilities & Risks You Must Not Miss: OWASP Top 10, and How to Detect Them
In this article, I'll go through the major pillars of LLM security and the ways to mitigate them at scale.
Kritin Vongthongsri
Red Teaming LLMs: The Ultimate Step-by-Step LLM Red Teaming Guide
In this article, you'll learn about LLM red teaming and how it can be carried out using DeepEval.
Kritin Vongthongsri
Evaluating LLM Systems: Essential Metrics, Benchmarks, and Best Practices
In this article, you'll learn how to evaluate LLM systems using LLM evaluation metrics and benchmark datasets.
Jeffrey Ip
Using LLMs for Synthetic Data Generation: The Definitive Guide
In this article, I'm show you everything you need on how to generate realistic synthetic datasets using LLMs.
Kritin Vongthongsri
How to Build an LLM Evaluation Framework, from Scratch
In this article, you're going to learn how to build the world's most robust and scalable LLM evaluation framework.
Jeffrey Ip
LLM Benchmarks Explained: Everything on MMLU, HellaSwag, BBH, and Beyond
In this article, I'm going to go through all the top LLM benchmarks currently used and why they matter.
Kritin Vongthongsri
LLM Testing in 2024: Top Methods and Strategies
In this article, we'll learn everything there is to LLM testing, including best practices and methods to test LLMs.
Jeffrey Ip
The Ultimate Guide to Fine-Tune LLaMA 3, With LLM Evaluations
In this article, we'll walkthrough how to fine-tune and evaluate a LLaMA-2 model using Hugging Face and DeepEval
Jeffrey Ip
RAG Evaluation: The Definitive Guide to Unit Testing RAG in CI/CD
In this tutorial, we'll walkthrough how to setup a full testing suite for RAG applications using DeepEval.
Jeffrey Ip
LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide
In this article, I'll walkthrough everything you need to know about LLM evaluation metrics, with code samples.
Jeffrey Ip
An Introduction to LLM Benchmarking
In this article, I'll show how benchmarking can help you choose the right LLM for your use case.
Jeffrey Ip
A Step-By-Step Guide to Evaluating an LLM Text Summarization Task
In this article, I'll teach you how to create your own text summarization metric.
Jeffrey Ip
Why OpenAI Assistants is a Big Win for LLM Evaluation
In this article, I'll share how JudgmentalGPT, our in-house evaluator was built using OpenAI's Assistants.
Jeffrey Ip
Become a Prompt Artist: Understanding the Midjourney LLM
In this interactive tutorial, I'll show you how to become a Midjournalist to create image you image.
Jeffrey Ip
How to Evaluate LLM Applications: The Complete Guide
In this article, we will debunk how to evaluate an LLM application / RAG pipelines the right way.
Jeffrey Ip
Why we replaced Pinecone with PGVector
Do you really need a dedicated vector database for your Generative AI application? Our experience says not always.
Jeffrey Ip
What is Retrieval Augmented Generation (RAG)?
In this article, we're going to dive deep into the RAG rabbit hole.
Jeffrey Ip
A Gentle Introduction to LLM Evaluation
In this article, we'll introduce the ways in which you can carry out automated, LLM evaluation.
Jeffrey Ip
How to build a PDF QA chatbot using OpenAI and ChromaDB
In this article, you'll learn how to build a RAG based chatbot on your PDFs using OpenAI and ChromaDB
Jeffrey Ip
Building a customer support chatbot using GPT-3.5 and lLamaIndex
In this article, you'll learn how to create a customer support chatbot using GPT-3.5 and lLamaIndex.
Jeffrey Ip
Generating synthetic data with LLMs - Part 1
LLMs make synthetic data easy to leverage, but how exactly can we make these generated data relevant and useful?
Jeffrey Ip
Subscribe to receive articles right in
your inbox
Thanks for joining our newsletter.
Oops! Something went wrong.
Latest articles
No items found.
Oct 8, 2024
LLM Chatbot Evaluation Explained: Top Metrics and Testing Techniques
Sep 24, 2024
Leveraging LLM-as-a-Judge for Automated and Scalable Evaluation
Oct 9, 2024
Top LLM Security Vulnerabilities & Risks You Must Not Miss: OWASP Top 10, and How to Detect Them
Oct 9, 2024
Red Teaming LLMs: The Ultimate Step-by-Step LLM Red Teaming Guide
Sep 1, 2024
Evaluating LLM Systems: Essential Metrics, Benchmarks, and Best Practices
Oct 9, 2024
Using LLMs for Synthetic Data Generation: The Definitive Guide
Next
Start using the data retrieval platform of
the future.
Get started