Agent Core

Use Confident AI for LLM observability and evals for Amazon AgentCore

Overview

Amazon AgentCore is AWS’s managed runtime for deploying and scaling AI agents. Confident AI allows you to trace and evaluate AgentCore agents — in just a few lines of code.

The integration works via OpenTelemetry: instrument_agentcore() registers a AgentCoreSpanInterceptor and a ContextAwareSpanProcessor on the global TracerProvider. The interceptor translates AWS Bedrock / Strands / Traceloop OTel spans into Confident AI spans, and the processor ships them to Confident AI in real time.

Tracing Quickstart

For users in the EU region, please set the OTEL endpoint to the EU version as shown below:

$export CONFIDENT_OTEL_URL="https://eu.otel.confident-ai.com"
1

Install Dependencies

Run the following command to install the required packages:

$pip install -U deepeval opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

If you are using AgentCore with Strands, also install:

$pip install bedrock-agentcore strands-agents
2

Instrument AgentCore

Call instrument_agentcore once at startup, before your agent runs. It attaches to the active OpenTelemetry TracerProvider (creating one if needed) and begins forwarding spans to Confident AI automatically.

main.py
1import os
2from bedrock_agentcore import BedrockAgentCoreApp
3from strands import Agent
4from deepeval.integrations.agentcore import instrument_agentcore
5
6instrument_agentcore()
7
8app = BedrockAgentCoreApp()
9agent = Agent(model="amazon.nova-lite-v1:0")
10
11@app.entrypoint
12def invoke(payload):
13 user_message = payload.get("prompt", "Hello! How can I help you today?")
14 result = agent(user_message)
15 return {"result": result.message}
16
17if __name__ == "__main__":
18 response = invoke({"prompt": "Explain OpenTelemetry in one sentence."})
19 print(f"Agent Response: {response['result']}")

instrument_agentcore is framework-agnostic. It works with any underlying agent framework that AgentCore supports — Strands, LangChain, LangGraph, and CrewAI are all detected automatically via OTel GenAI semantic conventions and Traceloop attributes.

3

Run your agent

Invoke your agent by executing the script:

$python main.py

You can directly view the traces on Confident AI by clicking on the link in the output printed in the console.

Advanced Usage

Logging threads

Threads are used to group related traces together, and are useful for chat apps, agents, or any multi-turn interactions. You can learn more about threads here. Pass the thread_id to instrument_agentcore.

main.py
1import os
2from bedrock_agentcore import BedrockAgentCoreApp
3from strands import Agent
4from deepeval.integrations.agentcore import instrument_agentcore
5
6instrument_agentcore(
7 thread_id="thread_1",
8 user_id="user_1"
9)
10
11app = BedrockAgentCoreApp()
12agent = Agent(model="amazon.nova-lite-v1:0")
13
14@app.entrypoint
15def invoke(payload):
16 user_message = payload.get("prompt", "Hello! How can I help you today?")
17 result = agent(user_message)
18 return {"result": result.message}

If your agent framework already sets a session.id attribute on spans (Strands does this via trace_attributes={"session.id": ...}), AgentCore integration will automatically use it as the thread_id when none is explicitly provided.

Trace attributes

Other trace-level attributes can be passed to instrument_agentcore. All parameters are optional and apply to every trace produced while the instrumentation is active.

main.py
1import os
2from bedrock_agentcore import BedrockAgentCoreApp
3from strands import Agent
4from deepeval.integrations.agentcore import instrument_agentcore
5
6instrument_agentcore(
7 name="Name of Trace",
8 tags=["Tag 1", "Tag 2"],
9 metadata={"Key": "Value"},
10 user_id="user_1",
11 thread_id="conversation-abc123",
12 environment="production",
13)
14
15app = BedrockAgentCoreApp()
16agent = Agent(model="amazon.nova-lite-v1:0")
17
18@app.entrypoint
19def invoke(payload):
20 user_message = payload.get("prompt", "Hello! How can I help you today?")
21 result = agent(user_message)
22 return {"result": result.message}
api_key
str

Your Confident AI API key. Defaults to the CONFIDENT_API_KEY environment variable when omitted.

name
str

The name of the trace. Learn more.

tags
List[str]

Tags are string labels that help you group related traces. Learn more.

metadata
Dict

Attach any metadata to the trace. Learn more.

thread_id
str

Supply the thread or conversation ID to view and evaluate conversations. Learn more.

user_id
str

Supply the user ID to enable user analytics. Learn more.

turn_id
str

The turn ID for multi-turn conversations.

test_case_id
str

Associate this trace with a specific test case ID.

metric_collection
str

The name of the metric collection to use for online evals at the trace level.

environment
str

The deployment environment. Accepted values: "production", "staging", "development", "testing". Defaults to "development".

Each attribute is optional, and works the same way as the native tracing features on Confident AI.

Logging prompts

If you are managing prompts on Confident AI and wish to log them, use next_llm_span to associate a Prompt with the next LLM span before invoking your agent.

main.py
1from bedrock_agentcore import BedrockAgentCoreApp
2from strands import Agent
3from deepeval.prompt import Prompt
4from deepeval.tracing import next_llm_span
5from deepeval.integrations.agentcore import instrument_agentcore
6
7instrument_agentcore(environment="production")
8
9app = BedrockAgentCoreApp()
10agent = Agent(model="amazon.nova-lite-v1:0")
11
12prompt = Prompt(alias="<prompt-alias>")
13prompt.pull(version="00.00.01")
14
15@app.entrypoint
16def invoke(payload):
17 user_message = payload.get("prompt", "")
18 with next_llm_span(prompt=prompt):
19 result = agent(user_message)
20 return {"result": result.message}

Be sure to pull the prompt before logging it, otherwise the prompt will not be visible on Confident AI.

Re-configuring at runtime

instrument_agentcore is idempotent — calling it again on the same TracerProvider updates the trace-level settings in place without stacking additional processors. This lets you reconfigure per-request fields (such as thread_id or user_id) by calling instrument_agentcore again before each invocation.

main.py
1from deepeval.integrations.agentcore import instrument_agentcore
2
3# First call: sets up the processors
4instrument_agentcore(environment="production")
5
6# Subsequent call: updates settings only, no new processors added
7instrument_agentcore(environment="production", user_id="user_42", thread_id="conv-999")

Evals Usage

Online evals

You can run online evals on your AgentCore agent, which will run evaluations on all incoming traces on Confident AI’s servers. This approach is recommended if your agent is in production.

1

Create metric collection

Create a metric collection on Confident AI with the metrics you wish to use to evaluate your AgentCore agent.

Create metric collection

Your metric collection must only contain metrics that evaluate the input and actual output of the component it is assigned to.

2

Run evals

Pass the metric_collection parameter to instrument_agentcore to run online evals at the trace level. For span-level evals, use update_current_span(metric_collection=...) inside your agent code.

main.py
1import os
2from bedrock_agentcore import BedrockAgentCoreApp
3from strands import Agent
4from deepeval.integrations.agentcore import instrument_agentcore
5
6instrument_agentcore(
7 metric_collection="my-trace-collection",
8 environment="production",
9)
10
11app = BedrockAgentCoreApp()
12agent = Agent(model="amazon.nova-lite-v1:0")
13
14@app.entrypoint
15def invoke(payload):
16 user_message = payload.get("prompt", "Hello! How can I help you today?")
17 result = agent(user_message)
18 return {"result": result.message}

We recommend creating separate metric collections for each component (trace, agent span, LLM span, tool span), since each requires its own evaluation criteria and metrics.

All incoming traces will now be evaluated using metrics from your metric collection.

You can view evals on Confident AI by clicking on the link in the output printed in the console.