In this tutorial, we're going to use GPT-3.5 provided by the OpenAI API. GPT-3.5 is a machine learning model and is like a super-smart computer buddy made by OpenAI. It's been trained with tons of data from the internet so it can chat, answer questions, and help with all sorts of language tasks.
But, you might wonder, can raw, out-of-the-box GPT-3.5 answer customer support questions that are specific to my own internal data?
Unfortunately, the answer is no 😔 because as you may know, GPT models have only been trained on public data up until 2021. This is precisely why we need open source frameworks like lLamaIndex! These frameworks help connect your internal data sources with GPT-3.5, so your chatbot can output tailored responses based on data that regular ChatGPT don't know about 😊 (PS. if you want to learn how to use the raw OpenAI API to build a chatbot instead of a framework like lLamaIndex, here is another great tutorial.)
Pretty cool, huh? Lets begin.
First, I'll guide you through how to set up a project for your chatbot.
Create the project folder and a python virtual environment by running the code below in terminal:
Your terminal should now start something like this:
Run the following code to install lLamaIndex:
Note that we don't require openai because lLamaIndex already provides a wrapper to call the OpenAI API under the hood.
Create a new main.py file - the entry point to your chatbot, and chatbot.py - your chatbot implementation.
Create a new `data.txt` file in a new `data` folder that would contain fake data on MadeUpCompany:
This file will contain the data that your chatbot is going to base its responses on. Luckily for us, ChatGPT prepared some fake information on MadeUpCompany 😌 Paste the following text in `data.txt`:
Lastly, navigate back to customer-support-chatbot containing main.py, and set your OpenAI API key as an environment variable. In your terminal, paste in this with your own API key (get yours here if you don't already have one):
All done! Let's start coding.
To begin, we first have to chunk and index the text we have in `data.txt` to a format that's readable for GPT-3.5. So you might wonder, what do you mean by "readable"? 🤯
Well, GPT-3.5 has something called a context limit, which refers to how much text the model can "see" or consider at one time. Think of it like the model's short-term memory. If you give it a really long paragraph or a big conversation history, it might reach its limit and not be able to add much more to it. If you hit this limit, you might have to shorten your text so the model can understand and respond properly.
In addition, GPT-3.5 performs worse if you supply it with way too much text, kind of how someone loses focus if you tell too long a story. This is exactly where lLamaIndex shines 🦄 llamaindex helps us breakdown large bodies of text into chunks that can be consumed by GPT-3.5 🥳
In a few lines of code, we can build our chatbot using lLamaIndex. Everything from chunking the text from data.txt, to calling the OpenAI APIs, is handled by lLamaIndex. Paste in the following code in chatbot.py:
And the following in main.py:
Now try it for yourself by running the code below:
Feel free to switch out the text in data/data.txt with your own knowledge base!
You might start to run into situations where the chatbot isn't performing as well as you hope for certain questions/inputs. Luckily there are several ways to improve your chatbot 😊
The quality of output from your chatbot is directly affected by the size of text chunks (scroll down for a better explanation why).
In chatbot.py, Add service_context = ServiceContext.from_defaults(chunk_size=1000) to VectorStoreIndex to alter the chunk size:
Play around with the size parameter to find what works best :)
Depending on your data, you might benefit from supply a lesser/greater number of text chunks to GPT-3.5. Here's how you can do it through query_engine = index.as_query_engine(similarity_top_k=5) in chatbot.py:
By now, you might have ran into the problem of eyeballing your chatbot output. You make a small configurational change, such as changing the number of retrieved text chunks, run main.py, type in the same old query, and wait 5 seconds to see if the result has gotten any better 😰 Sounds familiar?
The problem becomes worse if you want to inspect outputs from not just one, but several different queries. Here is a great read on how you can build your own evaluation framework in less than 20 minutes, but if you'd prefer to not reinvent the wheel, consider using a free open source packages like DeepEval. It helps you evaluate your chatbot so you don't have to do it yourself 😌
Since I'm slightly biased as the cofounder of Confident AI (which is the company behind DeepEval), I'm going to go ahead and show you how DeepEval can help with evaluating your chatbot (no but seriously, we offer unit testing for chatbots, have a stellar developer experience, and a free platform for you to holistically view your chatbot's performance 🥵)
Install by running the code:
Create a new test file:
Paste in the following code:
Run the test:
Your test should have passed! Let's breakdown what happened. The variable query mimics a user input, and output is what your chatbot outputs based on this query. The variable context contains the relevant information from your knowledge base, and FactualConsistencyMetric(minimum_score=0.7) is an out-of-the-box metric provided by DeepEval for your to assess how factually correct your chatbot's output is based on the provided context. This score ranges from 0 - 1, which the minimum_score=0.7 ultimately determines if your test have passed or not.
Add more tests to stop wasting time on fixing breaking changes to your chatbot 😙
The chatbot we just built actually relies on an architecture called Retrieval Augmented Generation (RAG). Retrieval Augmented Generation is a way to make GPT-3.5 smarter by letting it pull in fresh or specific information from an outside source, in this case data.txt. So, when you ask it something, it can give you a more current and relevant answer.
In the previous two sections, we looked at how tweaking two parameters — text chunk size and number of text chunks used can impact the quality of answer you get from GPT-3.5. This is because when you ask your chatbot a question, lLamaIndex retrieves the most relevant text chunks from data.txt, which GPT-3.5 will use to generate a data augmented answer.
In this article, you've learnt:
- What OpenAI GPT-3.5 is,
- How to build a simple chatbot on your own data using lLamaIndex,
- how to improve the quality of your chatbot,
- how to evaluate your chatbot using Deepeval
- what is RAG and how it works,
This tutorial walks you through an example of a chatbot you can build using lLamaIndex and GPT-3.5. With lLamaIndex, you can create powerful personalized chatbots useful in various applications, such as customer support, user onboarding, sales enablement, and more 🥳
The source code for this tutorial is available here:
Thank you for reading!
Subscribe to our weekly newsletter to stay confident in the AI systems you build.
In this article, I'll share how JudgmentalGPT, our in-house evaluator was built using OpenAI's Assistants.
In this interactive tutorial, I'll show you how to become a Midjournalist to create image you image.