Portkey

Portkey AI serves as a unified interface for interacting LLMs

Overview

Confident AI lets you trace and evaluate Portkey LLM calls, whether standalone or used as a component within a larger application.

Tracing Quickstart

1

Install Dependencies

Run the following command to install the required packages:

$pip install -U deepeval portkey-ai
2

Setup Confident AI Key

Login to Confident AI using your Confident API key.

$deepeval login
3

Configure Portkey

To begin tracing your Portkey LLM calls as a component in your application, import OpenAI and use the PORTKEY_GATEWAY_URL to trace the calls.

main.py
1from deepeval.openai import OpenAI
2from portkey_ai import PORTKEY_GATEWAY_URL
3
4portkey = OpenAI(
5base_url = PORTKEY_GATEWAY_URL,
6api_key = "<PORTKEY_API_KEY>"
7)
8
9response = portkey.chat.completions.create(
10 model = "@slug/<model>",
11 messages = [
12 {"role": "system", "content": "You are a helpful assistant."},
13 {"role": "user", "content": "What is Portkey"}
14 ],
15)

DeepEval’s Portkey client traces chat.completions.create method.

4

Run Portkey

Invoke your agent by executing the script:

$python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Advanced Usage

Logging prompts

If you are managing prompts on Confident AI and wish to log them, pass your Prompt to the create method.

main.py
1from portkey_ai import PORTKEY_GATEWAY_URL
2
3from deepeval.openai import OpenAI
4from deepeval.prompt import Prompt
5from deepeval.tracing import trace
6
7portkey = OpenAI(
8 base_url = PORTKEY_GATEWAY_URL,
9 api_key = "<PORTKEY_API_KEY>"
10)
11
12prompt = Prompt(alias="my_prompt")
13prompt.pull(version="00.00.01")
14
15with trace(prompt=prompt):
16 response = portkey.chat.completions.create(
17 model = "@slug/<model>",
18 messages = [
19 {"role": "system", "content": prompt.interpolate(name="John")}, # string system prompt
20 {"role": "user", "content": "What is Portkey"}
21 ],
22 )
23
24print(response.choices[0].message.content)

Logging threads

Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. Learn more about threads here. You can set the thread_id in the trace context.

main.py
1from deepeval.openai import OpenAI
2from deepeval.tracing import trace
3
4from portkey_ai import PORTKEY_GATEWAY_URL
5
6portkey = OpenAI(
7 base_url = PORTKEY_GATEWAY_URL,
8 api_key = "<PORTKEY_API_KEY>"
9)
10
11with trace(thread_id="test_thread_id_1"):
12 response = portkey.chat.completions.create(
13 model = "@slug/<model>",
14 messages = [
15 {"role": "system", "content": "You are a helpful assistant."},
16 {"role": "user", "content": "What is Portkey"}
17 ],
18 )
19
20print(response.choices[0].message.content)

This is an example of using STRING type prompt interpolation.

Evals Usage

Online evals

If your OpenAI application is in production, and you still want to run evaluations on your traces, use online evals. It lets you run evaluations on all incoming traces on Confident AI’s server.

1

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your OpenAI agent. Copy the name of the metric collection.

Create metric collection
2

Run evals

Set the llm_metric_collection name in the trace context when invoking your OpenAI client to evaluate Llm Spans.

main.py
1from deepeval.openai import OpenAI
2from deepeval.tracing import trace
3
4client = OpenAI()
5
6with trace(llm_metric_collection="test_collection_1"):
7 response = client.chat.completions.create(
8 model="gpt-4o",
9 messages=[
10 {"role": "system", "content": "You are a helpful assistant."},
11 {"role": "user", "content": "Hello, how are you?"},
12 ],
13 )

End-to-end evals

Confident AI allows you to run end-to-end evals on your OpenAI client to evaluate your Portkey calls directly. This is recommended if you are testing your Portkey calls in isolation.

1

Create metric

1from deepeval.metrics import AnswerRelevancyMetric
2
3task_completion = AnswerRelevancyMetric(
4 threshold=0.7,
5 model="gpt-4o-mini",
6 include_reason=True
7)

You can only run end-to-end evals on Portkey using metrics that evaluate input, output, or tools_called. You can pass parameters like expected_output, expected_tools, context and retrieval_context to the trace context.

2

Run evals

Replace your OpenAI client with DeepEval’s. Then, use the dataset’s evals_iterator to invoke your OpenAI client for each golden. Remember to replace base_url and api_key with the Portkey gateway URL and API key.

main.py
1from deepeval.openai import OpenAI
2from deepeval.metrics import AnswerRelevancyMetric, BiasMetric
3from deepeval.dataset import EvaluationDataset
4from deepeval.tracing import trace
5
6client = OpenAI(
7 base_url = PORTKEY_GATEWAY_URL,
8 api_key = "<PORTKEY_API_KEY>"
9)
10
11dataset = EvaluationDataset()
12dataset.pull("your-dataset-alias")
13
14for golden in dataset.evals_iterator():
15 with trace(
16 llm_metrics=[AnswerRelevancyMetric(), BiasMetric()],
17 expected_output=golden.expected_output,
18 ):
19 client.chat.completions.create(
20 model="gpt-4o",
21 messages=[
22 {"role": "system", "content": "You are a helpful assistant."},
23 {"role": "user", "content": golden.input}
24 ],
25 )

This will automatically generate a test run with evaluated Portkey traces using inputs from your dataset.

Using OpenAI in component-level evals

You can also evaluate Portkey calls through component-level evals. This approach is recommended if you are testing your Portkey calls as a component in a larger application system.

1

Create metric

1from deepeval.metrics import AnswerRelevancyMetric
2
3task_completion = AnswerRelevancyMetric(
4 threshold=0.7,
5 model="gpt-4o-mini",
6 include_reason=True
7)

As with end-to-end evals, you can only use metrics that evaluate input, output, or tools_called.

2

Run evals

Replace your OpenAI client with DeepEval’s. Then, use the dataset’s evals_iterator to invoke your LLM application for each golden.

Make sure that each function or method in your LLM application is decorated with @observe.

1from deepeval.openai import OpenAI
2from deepeval.tracing import observe, trace
3from deepeval.dataset import EvaluationDataset
4from deepeval.metrics import AnswerRelevancyMetric
5
6client = OpenAI(
7 base_url = PORTKEY_GATEWAY_URL,
8 api_key = "<PORTKEY_API_KEY>"
9)
10
11@observe()
12def generate_response(input: str) -> str:
13 with trace(
14 llm_metrics=[AnswerRelevancyMetric()],
15 expected_output=golden.output,
16 ):
17 response = client.chat.completions.create(
18 model="gpt-4.1",
19 messages=[
20 {"role": "system", "content": "You are a helpful assistant."},
21 {"role": "user", "content": input},
22 ],
23 )
24 return response
25
26# Create dataset
27dataset = EvaluationDataset()
28dataset.pull("your-dataset-alias")
29
30# Run component-level evaluation
31for golden in dataset.evals_iterator():
32 generate_response(golden.input)