LlamaIndex | Confident AI Docs

Overview

LlamaIndex is an LLM framework that makes it easy to build knowledge agents from complex data. Confident AI allows you to trace and evaluate LlamaIndex agents in just a few lines of code.

Tracing Quickstart

Install Dependencies

Run the following command to install the required packages:

$ pip install -U deepeval llama-index

Setup Confident AI Key

$ deepeval login

Configure LlamaIndex

Instrument LlamaIndex using instrument_llama_index to enable Confident AI’s LlamaIndexHandler.

main.py

1 import asyncio
2 from llama_index.llms.openai import OpenAI
3 from llama_index.core.agent import FunctionAgent
4 import llama_index.core.instrumentation as instrument
5 
6 from deepeval.integrations.llama_index import instrument_llama_index
7 instrument_llama_index(instrument.get_dispatcher())
8 
9 def multiply(a: float, b: float) -> float:
10     """Useful for multiplying two numbers."""
11     return a * b
12 
13 agent = FunctionAgent(
14     tools=[multiply],
15     llm=OpenAI(model="gpt-4o-mini"),
16     system_prompt="You are a helpful assistant that can perform calculations.",
17 )
18 
19 async def llm_app(input: str):
20     return await agent.run(input)
21 
22 asyncio.run(llm_app("What is 3 * 12?"))

Now whenever you use LlamaIndex, DeepEval will collect LlamaIndex traces and publish them to Confident AI.

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Evals Usage

Online evals

You can run online evals on your LlamaIndex agent, which will run evaluations on all incoming traces on Confident AI’s servers. This approach is recommended if your agent is in production.

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your LlamaIndex agent.

Create metric collection

Your metric collection should only contain metrics that don’t require retrieval_context, context, expected_output, or expected_tools for evaluation.

Run evals

Confident AI supports online evals for LlamaIndex’s FunctionAgent, ReActAgent and CodeActAgent. Replace your LlamaIndex agent with DeepEval’s, and provide metric collection as an argument to the agent.

main.py

1 import asyncio
2 from llama_index.llms.openai import OpenAI
3 import llama_index.core.instrumentation as instrument
4 from deepeval.integrations.llama_index import instrument_llama_index
5 from deepeval.integrations.llama_index import FunctionAgent
6 
7 instrument_llama_index(instrument.get_dispatcher())
8 
9 def multiply(a: float, b: float) -> float:
10     """Useful for multiplying two numbers."""
11     return a * b
12 
13 agent = FunctionAgent(
14     tools=[multiply],
15     llm=OpenAI(model="gpt-4o-mini"),
16     system_prompt="You are a helpful assistant that can perform calculations.",
17     metric_collection="<your-metric-collection-name>",
18 )
19 
20 async def llm_app(input: str):
21     return await agent.run(input)
22 
23 asyncio.run(llm_app("What is 3 * 12?"))

All incoming traces will now be evaluated using metrics from your metric collection.

End-to-end evals

Running end-to-end evals on your LlamaIndex agent evaluates your agent locally, and is the recommended approach if your agent is in a development or testing environment.

Create metric

1 from deepeval.metrics import AnswerRelevancyMetric
2 
3 answer_relevancy_metric = AnswerRelevancyMetric(
4     threshold=0.7,
5     model="gpt-4o-mini",
6     include_reason=True
7 )

Similar to online evals, you can only run end-to-end evals on metrics that don’t require retrieval_context, context, expected_output, or expected_tools for evaluation.

Run evals

Provide your metrics. Then, use the dataset’s evals_iterator to invoke your LlamaIndex agent for each golden.

Asynchronous

main.py

1 import asyncio
2 from llama_index.llms.openai import OpenAI
3 import llama_index.core.instrumentation as instrument
4 from deepeval.integrations.llama_index import instrument_llama_index
5 from deepeval.metrics import AnswerRelevancyMetric
6 from deepeval.integrations.llama_index import FunctionAgent
7 
8 instrument_llama_index(instrument.get_dispatcher())
9 answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.7, model="gpt-4o-mini", include_reason=True)
10 
11 def multiply(a: float, b: float) -> float:
12     """Useful for multiplying two numbers."""
13     return a * b
14 
15 agent = FunctionAgent(
16     tools=[multiply],
17     llm=OpenAI(model="gpt-4o-mini"),
18     system_prompt="You are a helpful assistant that can perform calculations.",
19     metrics=[answer_relevancy_metric],
20 )
21 
22 async def llm_app(input: str):
23     return await agent.run(input)
24 
25 from deepeval.dataset import EvaluationDataset, Golden
26 
27 dataset = EvaluationDataset(
28     goldens=[Golden(input="What is 3 * 12?"), Golden(input="What is 4 * 13?")]
29 )
30 
31 for golden in dataset.evals_iterator():
32     task = asyncio.create_task(llm_app(golden.input))
33     dataset.evaluate(task)

This will automatically generate a test run with evaluated traces using inputs from your dataset.

View on Confident AI

You can view the evals on Confident AI by clicking on the link in the output printed in the console.