LangChain

Use Confident AI for LLM observability and evals for LangChain

Overview

LangChain is a framework for building LLM applications. Confident AI provides a callback handlers in both python and typescript SDKs to trace and evaluate your LangChain applications automatically.

The callback handler captures the following spans from your LangChain application:

  • LLM spans — model name, provider, input messages, output content, tool calls made by the model, and token usage
  • Tool spans — tool name, input parameters, and output; also aggregated at the trace level as tools_called
  • Retriever spans — query input and retrieved document output
  • Chain spans — inputs and outputs for top-level chains (used to set trace-level input/output)

Tracing Quickstart

1

Install Dependencies

Run the following command to install the required packages:

1pip install -U deepeval langchain langchain-openai
2

Setup Confident AI Key

Login to Confident AI using your Confident API key.

$export CONFIDENT_API_KEY="<your-confident-api-key>"
3

Configure LangChain

Provide DeepEval’s CallbackHandler to your LangChain application’s invoke method.

1from langchain_core.tools import tool
2from langchain_openai import ChatOpenAI
3from langchain_core.prompts import ChatPromptTemplate
4from langchain.agents import create_tool_calling_agent, AgentExecutor
5from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
6
7from deepeval.integrations.langchain import CallbackHandler
8
9@tool
10def multiply(a: int, b: int) -> int:
11 """Returns the product of two numbers"""
12 return a * b
13
14llm = ChatOpenAI(model="gpt-4o-mini")
15
16agent_prompt = ChatPromptTemplate.from_messages(
17 [
18 ("system", "You are a helpful assistant that can perform mathematical operations."),
19 ("human", "{input}"),
20 MessagesPlaceholder("agent_scratchpad"),
21 ]
22)
23
24agent = create_tool_calling_agent(llm, [multiply], agent_prompt)
25agent_executor = AgentExecutor(agent=agent, tools=[multiply], verbose=True)
26
27result = agent_executor.invoke(
28 {"input": "What is 8 multiplied by 6?"},
29 config={"callbacks": [CallbackHandler()]},
30)

DeepEval’s CallbackHandler extends LangChain’s BaseCallbackHandler or LangChain.js’ BaseCallbackHandler.

4

Run LangChain

Invoke your application by executing the script:

1python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Advanced Features

Set trace attributes

Confident AI’s LLM tracing advanced features provide teams with the ability to set certain attributes for each trace when invoking your LangChain application.

For example, thread_id and user_id are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here.

You can set these attributes in the CallbackHandler when invoking your LangChain application.

1result = agent_executor.invoke(
2 {"input": "What is 8 multiplied by 6?"},
3 config={
4 "callbacks": [CallbackHandler(thread_id="123")]
5 },
6)
name / name
str / string

The name of the trace. Learn more.

tags / tags
List[str] / string[]

Tags are string labels that help you group related traces. Learn more.

metadata / metadata
Dict / Record<string, any>

Attach any metadata to the trace. Learn more.

thread_id / threadId
str / string

Supply the thread or conversation ID to view and evaluate conversations. Learn more.

user_id / userId
str / string

Supply the user ID to enable user analytics. Learn more.

test_case_id / testCaseId
str / string

Attach a test case ID to associate this trace with a specific test case.

turn_id / turnId
str / string

Supply a turn ID to identify individual turns in a multi-turn conversation.

metrics / metrics
List[BaseMetric] / BaseMetric[]

A list of metrics to run against the root span of this trace. Used for offline (development) evaluations.

metric_collection / metricCollection
str / string

The name of a metric collection on Confident AI to use for online (production) evaluations.

Each attribute is optional, and works the same way as the native tracing features on Confident AI. Python uses snake_case (e.g. thread_id) and TypeScript uses camelCase (e.g. threadId).

Logging prompts

If you are managing prompts on Confident AI and wish to log them, pass your Prompt object to the language model instance’s metadata parameter.

1from langchain_openai import ChatOpenAI
2from deepeval.prompt import Prompt
3
4prompt = Prompt(alias="<prompt-alias>")
5prompt.pull(version="00.00.01")
6
7llm = ChatOpenAI(
8 model="gpt-4o-mini",
9 metadata={"prompt": prompt}
10)

Logging prompts lets you attribute specific prompts to LangChain LLM spans. Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.

Evals Usage

Online evals

If your LangChain application is in production, and you still want to run evaluations on your traces, use online evals. It lets you run evaluations on all incoming traces on Confident AI’s server.

1

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your LangChain application. Copy the name of the metric collection.

Create metric collection

The current LangChain integration supports metrics that only evaluate Input and Actual Output in addition to the Task Completion metric.

2

Run evals

Set the metric_collection name to evaluate various components of your LangChain application.

This is the top level component of your LangChain application. Also a very ideal component to evaluate with the Task Completion metric.

1from langchain_core.tools import tool
2from langchain_openai import ChatOpenAI
3from langchain_core.prompts import ChatPromptTemplate
4from langchain.agents import create_tool_calling_agent, AgentExecutor
5from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
6
7from deepeval.integrations.langchain import CallbackHandler
8
9@tool
10def multiply(a: int, b: int) -> int:
11 """Returns the product of two numbers"""
12 return a * b
13
14llm = ChatOpenAI(model="gpt-4o-mini")
15agent_prompt = ChatPromptTemplate.from_messages([("system", "You are a helpful assistant that can perform mathematical operations."), ("human", "{input}"), MessagesPlaceholder("agent_scratchpad")])
16agent = create_tool_calling_agent(llm, [multiply], agent_prompt)
17agent_executor = AgentExecutor(agent=agent, tools=[multiply], verbose=True)
18
19result = agent_executor.invoke(
20 {"input": "What is 8 multiplied by 6?"},
21 config={
22 "callbacks": [
23 CallbackHandler(metric_collection="<metric_collection_name>")
24 ]
25 },
26)

All incoming traces will now be evaluated using metrics from your metric collection.

View on Confident AI

You can view the evals on Confident AI by clicking on the link in the output printed in the console.