LangChain | Confident AI Docs

Overview

LangChain is a framework for building LLM applications. Confident AI provides a CallbackHandler to trace and evaluate LangChain applications.

Tracing Quickstart

Install Dependencies

Run the following command to install the required packages:

1 pip install -U deepeval langchain langchain-openai

Setup Confident AI Key

$ export CONFIDENT_API_KEY="<your-confident-api-key>"

Configure LangChain

Provide DeepEval’s CallbackHandler to your LangChain application’s invoke method.

1 from langchain_core.tools import tool
2 from langchain_openai import ChatOpenAI
3 from langchain_core.prompts import ChatPromptTemplate
4 from langchain.agents import create_tool_calling_agent, AgentExecutor
5 from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
6 
7 from deepeval.integrations.langchain import CallbackHandler
8 
9 @tool
10 def multiply(a: int, b: int) -> int:
11     """Returns the product of two numbers"""
12     return a * b
13 
14 llm = ChatOpenAI(model="gpt-4o-mini")
15 
16 agent_prompt = ChatPromptTemplate.from_messages(
17     [
18         ("system", "You are a helpful assistant that can perform mathematical operations."),
19         ("human", "{input}"),
20         MessagesPlaceholder("agent_scratchpad"),
21     ]
22 )
23 
24 agent = create_tool_calling_agent(llm, [multiply], agent_prompt)
25 agent_executor = AgentExecutor(agent=agent, tools=[multiply], verbose=True)
26 
27 result = agent_executor.invoke(
28     {"input": "What is 8 multiplied by 6?"},
29     config={"callbacks": [CallbackHandler()]},
30 )

DeepEval’s CallbackHandler extends LangChain’s BaseCallbackHandler or LangChain.js’ BaseCallbackHandler.

Run LangChain

Invoke your application by executing the script:

1 python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Advanced Features

Set trace attributes

Confident AI’s LLM tracing advanced features provide teams with the ability to set certain attributes for each trace when invoking your LangChain application.

For example, thread_id and user_id are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here.

You can set these attributes in the CallbackHandler when invoking your LangChain application.

1 result = agent_executor.invoke(
2     {"input": "What is 8 multiplied by 6?"},
3     config={
4         "callbacks": [CallbackHandler(thread_id="123")]
5     },
6 )

View Trace Attributes

name

str

The name of the trace. Learn more.

Logging prompts

If you are managing prompts on Confident AI and wish to log them, pass your Prompt object to the language model instance’s metadata parameter.

1 from langchain_openai import ChatOpenAI
2 from deepeval.prompt import Prompt
3 
4 prompt = Prompt(alias="<prompt-alias>")
5 prompt.pull(version="00.00.01")
6 
7 llm = ChatOpenAI(
8     model="gpt-4o-mini",
9     metadata={"prompt": prompt}
10 )

Logging prompts lets you attribute specific prompts to OpenAI Agent LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.

Evals Usage

Online evals

If your LangChain application is in production, and you still want to run evaluations on your traces, use online evals. It lets you run evaluations on all incoming traces on Confident AI’s server.

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your LangGraph agent. Copy the name of the metric collection.

Create metric collection

The current LangChain integration supports metrics that only evaluate Input and Actual Output in addition to the Task Completion metric.

Run evals

Set the metric_collection name to evaluate various components of your LangChain application.

Agent Span

LLM Span

Tool Span

This is the top level component of your LangChain application. Also a very idle component to evaluate with the Task Completion metric.

1 from langchain_core.tools import tool
2 from langchain_openai import ChatOpenAI
3 from langchain_core.prompts import ChatPromptTemplate
4 from langchain.agents import create_tool_calling_agent, AgentExecutor
5 from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
6 
7 from deepeval.integrations.langchain import CallbackHandler
8 
9 @tool
10 def multiply(a: int, b: int) -> int:
11     """Returns the product of two numbers"""
12     return a * b
13 
14 llm = ChatOpenAI(model="gpt-4o-mini")
15 agent_prompt = ChatPromptTemplate.from_messages([("system", "You are a helpful assistant that can perform mathematical operations."), ("human", "{input}"), MessagesPlaceholder("agent_scratchpad")])
16 agent = create_tool_calling_agent(llm, [multiply], agent_prompt)
17 agent_executor = AgentExecutor(agent=agent, tools=[multiply], verbose=True)
18 
19 result = agent_executor.invoke(
20     {"input": "What is 8 multiplied by 6?"},
21     config={
22         "callbacks": [
23           CallbackHandler(metric_collection="<metric_collection_name>")
24         ]
25     },
26 )

All incoming traces will now be evaluated using metrics from your metric collection.

End-to-end evals

Running end-to-end evals on your LangChain agent evaluates your agent locally, and is the recommended approach if your agent is in a development or testing environment.

Create metric

1 from deepeval.metrics import TaskCompletionMetric
2 
3 task_completion = TaskCompletionMetric(
4     threshold=0.7,
5     model="gpt-4o-mini",
6     include_reason=True
7 )

Similar to online evals, you can only run end-to-end evals on LangChain using TaskCompletionMetric.

Run evals

Provide your metrics to the CallbackHandler. Then, use the dataset’s evals_iterator to invoke your LangChain agent for each golden.

Synchronous

Asynchronous

Python

1 from langchain_core.tools import tool
2 from langchain_openai import ChatOpenAI
3 from deepeval.metrics import TaskCompletionMetric
4 from langchain.agents import create_tool_calling_agent, AgentExecutor
5 from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
6 from deepeval.integrations.langchain import CallbackHandler
7 
8 @tool
9 def multiply(a: int, b: int) -> int:
10     """Returns the product of two numbers"""
11     return a * b
12 
13 agent_executor = AgentExecutor(agent=create_tool_calling_agent(ChatOpenAI(model="gpt-4o-mini"), [multiply], ChatPromptTemplate.from_messages([("system", "You are a helpful assistant that can perform mathematical operations."), ("human", "{input}"), MessagesPlaceholder("agent_scratchpad")])), tools=[multiply], verbose=True)
14 task_completion_metric = TaskCompletionMetric(threshold=0.7, model="gpt-4o-mini", include_reason=True)
15 
16 from deepeval.dataset import EvaluationDataset, Golden
17 
18 dataset = EvaluationDataset(
19     goldens=[
20         Golden(input="What is 3 * 12?"),
21         Golden(input="What is 8 * 6?"),
22     ]
23 )
24 
25 def llm_agent_eval(golden: Golden):
26     result = agent_executor.invoke({"input": golden.input},
27         config={
28             "callbacks": [CallbackHandler(metrics=[task_completion_metric])]
29         },
30     )
31     return result
32 
33 for golden in dataset.evals_iterator():
34     llm_agent_eval(golden)

This will automatically generate a test run with evaluated traces using inputs from your dataset.

View on Confident AI

You can view the evals on Confident AI by clicking on the link in the output printed in the console.