TL;DR — Top 5 Tools for Agentic Systems at Scale in 2026
Confident AI is the best platform in 2026 for alerting, monitoring, and evaluating agentic systems at scale because it pairs OTEL-native trace capture with agent-grade evals and quality-aware alerting in one workflow — at $1/GB-month with unlimited traces.
Other alternatives include:
- Datadog LLM Observability — Best-in-class alerting and deepest enterprise stack integration, but thin on agent-grade eval depth.
- LangSmith — Unmatched LangGraph-native tracing and evals, but real framework lock-in and per-seat pricing.
- Helicone — Cheap, lightweight request logging, but built around single LLM calls — not multi-step agents.
- Arize AI — Mature ML monitoring foundation, but built-in agent eval depth and quality-aware alerting are shallow.
Pick Confident AI if you want monitoring, evals, and alerts on one platform — not three.
Confident AI helps you monitor, evaluate, and alert on agents in one platform
Book a DemoBy 2026, "AI in production" usually means agents — multi-step systems with tools, memory, RAG, and branching execution. The monitoring demands look nothing like a single-call chatbot: one agent run fans out into dozens of LLM calls and tool invocations, and a confidently wrong answer in 200ms is worse than a timeout.
The tools that hold up at scale capture multi-step traces with enough fidelity to debug, score agents on tool-call correctness and task completion (not just "did the model answer"), and alert on quality — not just latency and errors. This guide ranks the five enterprises actually shortlist on exactly that.
The Top 5 Tools at a Glance
Tool | Category | Pricing | Open Source | Best For |
|---|---|---|---|---|
Confident AI | All-in-one: agent evals + observability + alerts | Free; from $19.99/seat/mo; $1/GB-month | No, but fully supported self-hosting | Teams that want agent evals, monitoring, and quality-aware alerts in one platform |
Datadog LLM Observability | Enterprise APM + LLM monitoring | Custom (usually $$$ at scale) | No | Enterprises that want LLM traces sitting inside their existing APM and software stack |
LangSmith | LangChain-native tracing + evaluation | Free tier; from $39/seat/mo | No | LangChain/LangGraph-heavy teams that want first-party agent tracing and evals |
Helicone | LLM gateway + request logging | Free tier; from $20/seat/mo | Yes (Apache-2.0) | Solo developers and small teams that need cheap request logs and cost tracking |
Arize AI | Enterprise LLM observability + evaluation | Free tier (Phoenix); from $50/mo | Yes (Phoenix, ELv2) | Large engineering orgs extending ML monitoring into agent observability |
What "At Scale" Actually Demands of an Agentic Monitoring Stack
Five capabilities separate tools that hold up at agent scale from ones that don't.
Multi-Step Agent Trace Fidelity
Agent runs are trees, not rows — parent runs, tool calls, sub-agents, retries, branching paths. Tools that flatten the tree into a timeline lose the structure on-call engineers need to debug. Tools that visualize the graph turn one-hour incidents into five-minute fixes.
Agent-Grade Evaluation Metrics
Agent quality is multi-dimensional: tool-call correctness, task completion, multi-turn fidelity, and step-by-step reasoning — not just faithfulness. Tools that ship only classic RAG metrics (or none) leave most of agent quality untested.
Quality-Aware Alerting
A confidently wrong answer in 200ms is invisible to a latency/error dashboard. Platforms that matter alert on quality — faithfulness drops, hallucination spikes, PII leakage, jailbreak patterns — into PagerDuty, Slack, and Teams.
Scale Economics and Stack Fit
Per-trace and per-seat pricing look fine at pilot and break at scale. Per-GB, unlimited traces, or self-hosted survive real traffic. Stack fit matters too: enterprises with existing APM and SRE tooling get leverage from monitoring that drops into the existing stack instead of adding another console and contract.
Closed Loop With CI/CD and Datasets
Production failures should auto-feed eval datasets, surface as CI/CD regressions before they reship, and run against the same metric definitions in pre-production and live traffic. When monitoring and evals don't share data, datasets go stale the day you ship.
How We Evaluated These Tools
We analyzed official documentation, GitHub repositories, public pricing where available, and community discussion across Hacker News, Reddit, and AI engineering communities. Vendors that publish their trace schemas, metric methodologies, and pricing transparently were rated higher than ones that gate everything behind a sales call.
For this analysis, we focused on six dimensions:
- Agent trace fidelity: how cleanly the platform captures multi-step, multi-tool, multi-agent runs
- Eval depth: breadth and quality of built-in metrics for agents, tool calls, RAG, and multi-turn behavior
- Alerting quality: ability to alert on quality signals, not just latency and errors
- Scale economics and stack fit: does pricing stay predictable as traffic grows, and does the tool drop cleanly into the existing enterprise stack
- Framework alignment: support for OTEL, OpenInference, and the major agent frameworks (LangChain/LangGraph, CrewAI, Pydantic AI, Vercel AI SDK)
- Closed loop with CI/CD: does production telemetry feed evals, datasets, and regression tests automatically
1. Confident AI
Type: All-in-one — agent evals + observability + quality-aware alerting · Pricing: Free, Starter $19.99/seat/mo, Premium $49.99/seat/mo, plus custom Team and Enterprise; observability at $1/GB-month with unlimited traces · Open Source: No, but fully supported self-hosting · Website: https://www.confident-ai.com
Confident AI is the only platform on this list that runs agent evaluation, production observability, and quality-aware alerting in one workspace — same datasets, same metrics, same traces. A failing production trace becomes a regression row, runs as an eval in CI/CD, and fires a PagerDuty/Slack/Teams alert if the pattern recurs.
On top of trace capture, Signals runs continuous anomaly detection and auto-surfaces issues nobody thought to look for — circular outputs, new topics, frustrated users, timeout clusters, prompt injection trends — so regressions don't wait for a customer ticket. Observability is OTEL-native and framework-agnostic (OpenAI, LangChain, LangGraph, Pydantic AI, CrewAI, Vercel AI SDK, OpenInference) at $1/GB-month with unlimited traces. Evaluation ships 50+ research-backed metrics across agents, RAG, multi-turn, tool-call correctness, task completion, and safety (open-source through DeepEval).

Customers include Panasonic, Toshiba, Amdocs, BCG, and CircleCI. External reviewers on Gartner Peer Insights highlight the combined evaluation, observability, and alerting workflow as a differentiator versus point tools.
Best for: Teams that want agent monitoring, evals, and quality-aware alerts in one platform — without stitching together three vendors and three workflows.
Standout Features
- All three layers in one platform: agent observability, evals, and quality-aware alerts share datasets, metrics, and traces
- 50+ research-backed metrics covering agents, tool-call correctness, task completion, multi-turn behavior, RAG, and safety (open-source through DeepEval)
- OpenTelemetry-native trace capture framework-agnostic across OpenAI, LangChain, LangGraph, Pydantic AI, CrewAI, Vercel AI SDK, OTEL, and OpenInference
- Signals: automatic anomaly detection that surfaces production issues — circular output spikes, new topics, frustrated users, timeouts, prompt injection trends — before the team has thought to look for them
- Quality-aware alerting to PagerDuty, Slack, and Teams — fires on faithfulness drops, hallucination spikes, PII leakage, and jailbreak patterns, not just latency
- Predictable scale economics: $1/GB-month with unlimited traces; no per-trace gotchas as agent fan-out grows
- Closed loop with CI/CD: pytest integration blocks releases on regressions; production traces auto-curate into eval datasets

Pros | Cons |
|---|---|
Only platform that runs agent observability, evals, and quality-aware alerts in one loop | Purpose-built for AI workloads — teams also monitoring general application and infrastructure still pair with an APM |
Unlimited traces at $1/GB-month — predictable economics as agent traffic grows | Closed-source platform (though fully supported self-hosting is available) |
Agent-grade metrics (tool calls, task completion, multi-turn) out of the box | Breadth of platform may be more than what's needed if you only need one layer |
Framework-agnostic and OTEL-native — no lock-in to LangChain or any single agent framework | Best fit when AI quality is treated as a first-class workload, not a single signal in a broader infrastructure dashboard |
Confident AI helps you monitor, evaluate, and alert on agents in one platform
Book a personalized 30-min walkthrough for your team's use case.
FAQ
Q: How does Confident AI handle alerting at scale?
Alerts are quality-aware: they fire on metric thresholds (faithfulness, hallucination, PII leakage, jailbreak patterns) in addition to standard latency and error signals, and route to PagerDuty, Slack, and Teams. The same metric definitions used in pre-production evals run on production traffic, so a regression in CI/CD and a drift in production are the same signal.
Q: How does pricing scale with agent traffic?
Observability is $1/GB-month with unlimited traces — agent fan-out (parent runs, sub-agents, tool calls) doesn't multiply your bill by trace count. Evaluation is priced per seat with self-serve tiers, plus custom Team and Enterprise for cross-functional adoption.
2. Datadog LLM Observability
Type: Enterprise APM + LLM monitoring · Pricing: Custom (usually $$$ at scale) · Open Source: No · Website: https://www.datadoghq.com/product/llm-observability
Datadog's biggest argument isn't the LLM module on its own — it's that it drops into the stack the enterprise has already standardized on. APM, logs, infra metrics, SRE alerting, incident workflows, RBAC, SSO, SOC 2, HIPAA, FedRAMP — all already in place. Adding LLM traces is a checkbox, not a procurement cycle, and the alerting infrastructure (monitors, anomaly detection, composite alerts, multi-channel routing) is some of the strongest in the market.
The trade-off is depth on the AI-native quality layer. Agent-specific metrics (tool-call correctness, task completion, faithfulness) are thinner and less research-backed than AI-native vendors, and the closed loop between monitoring, eval datasets, and CI/CD has to be wired up by hand. Pricing also punishes high-cardinality LLM trace data without careful sampling.

Best for: Enterprises that have already standardized on Datadog and want LLM and agent traces to live inside the same workspace as APM, logs, and infrastructure — paired with an AI-native eval platform for agent-specific quality metrics.
Standout Features
- Drops into the existing enterprise software stack — no new console, contract, or on-call rotation
- Best-in-class alerting infrastructure: monitors, anomaly detection, composite alerts, multi-channel routing
- LLM traces correlated with APM, logs, infrastructure metrics, and security signals in one workspace
- Proven enterprise tenancy at very high throughput
- Strong RBAC, SSO, SOC 2, HIPAA, and FedRAMP posture
- Mature integrations across the broader cloud-native stack
Pros | Cons |
|---|---|
Deepest integration with the existing enterprise software stack of any tool on this list | LLM eval depth is limited compared to AI-native platforms |
Best-in-class alerting and SRE-grade monitoring at enterprise scale | Agent-specific metrics (tool-call correctness, task completion) are thinner than AI-native vendors |
LLM traces correlated with APM, logs, and infra in one workspace | Pricing at LLM-trace cardinality can become a significant budget line without careful sampling |
Mature enterprise tenancy with strong compliance posture | Closed loop between production monitoring, eval datasets, and CI/CD has to be wired up by hand |
FAQ
Q: Why pick Datadog over an AI-native platform?
For enterprises already running Datadog across APM, infrastructure, logging, and SRE alerting, the LLM module lands inside the existing workspace, contract, and access-control model — no new procurement, no new on-call rotation. That stack-fit advantage is often decisive for organizations where adopting a new vendor is a multi-quarter exercise. Most teams that pick Datadog for LLM traces still pair it with an AI-native eval platform for agent-grade quality metrics.
Q: How does Datadog pricing handle high-volume agent traces?
Pricing scales with cardinality and ingestion volume, and high-volume agent traces (deep tool-call trees, retries, sub-agent fan-out) can become a meaningful budget line. Sampling and trace retention policies are important to dial in at scale.
3. LangSmith
Type: LangChain-native tracing + evaluation · Pricing: Free tier; Plus from $39/seat/mo; custom Enterprise · Open Source: No · Website: https://www.langchain.com/langsmith
LangSmith is LangChain's first-party observability and evaluation platform — the natural pick for LangChain/LangGraph-heavy stacks. Trace inspection captures the full LangGraph execution graph (node-by-node state, tool calls, conditional edges, human-in-the-loop checkpoints), and no other tool gives you that structural fidelity out of the box.
The trade-offs are framework lock-in and per-seat pricing. The deepest experience requires LangChain — non-LangChain stacks lose most of the value. Per-seat pricing adds friction to cross-functional adoption, and quality-aware alerting depth is lighter than alerting-first platforms.

Best for: LangChain/LangGraph-heavy teams that want tightly coupled tracing, evaluation, and prompt management for agents in one product — and that have a separate alerting platform for production SRE workflows.
Standout Features
- Deepest first-party LangChain and LangGraph integration of any platform
- LangGraph-native trace capture with node-by-node state, tool calls, and conditional edges
- Trace inspection, feedback capture, and dataset management in one workspace
- Prompt hub for versioning and reuse
- Automated and human-in-the-loop evaluators
- CI/CD integration for evaluation runs
Pros | Cons |
|---|---|
Deepest LangChain/LangGraph integration of any platform | Best-in-class experience effectively requires LangChain — framework lock-in is real |
LangGraph-native agent trace capture is unmatched | Quality-aware alerting depth is lighter than alerting-first platforms |
Clean evaluation + tracing pairing for LangChain-native teams | Per-seat pricing scales quickly with cross-functional adoption |
Active product velocity with frequent feature releases | Cross-functional workflows are weaker than evaluation-first platforms |
FAQ
Q: Can I use LangSmith without LangChain?
Yes, via the SDK and OpenTelemetry — but you give up much of the value proposition. The platform is built around LangChain idioms, and stacks that don't use LangChain typically get a better fit from framework-agnostic platforms.
Q: How does LangSmith handle alerting?
Alerting is available but less mature than alerting-first platforms — most teams that adopt LangSmith for traces and evals pair it with a separate APM or Confident-style quality-aware alerting layer for production.
4. Helicone
Type: LLM gateway + request logging · Pricing: Free tier; Pro from $20/seat/mo; custom Enterprise · Open Source: Yes (Apache-2.0) · Website: https://www.helicone.ai
Helicone is a lightweight LLM gateway focused on request logging, cost analytics, and prompt versioning. Point your client at the proxy URL and every request, response, latency, token, and cost lands in a searchable log view. For solo developers and small teams, it's one of the lowest-friction options in the category — and the Apache-2.0 license makes self-hosting viable.
For agentic systems at scale, Helicone is a thin fit. The product is built around the single LLM request, not the multi-step agent run — tool-call trees, sub-agent fan-out, and branching paths aren't first-class. There are no built-in agent-grade eval metrics, quality-aware alerting isn't a real surface, and the proxy hop adds latency on workloads that can't take it.

Best for: Solo developers and small teams that need cheap request logs and cost tracking — and don't yet have a multi-step agent workload that needs structural trace fidelity, agent evals, or quality-aware alerting.
Standout Features
- One-line proxy integration captures requests, responses, latency, token usage, and cost
- Searchable log view with filtering by model, user, and metadata
- Prompt versioning and basic experimentation
- Cost dashboards with attribution across models and users
- Apache-2.0 licensed with self-hosting available
Pros | Cons |
|---|---|
Cheap, fast, and easy to set up — one-line proxy integration | Built around single-request observation, not multi-step agent runs |
Open-source license makes self-hosting viable | No first-class agent trace structure (tool-call trees, sub-agent fan-out, branching paths) |
Solid cost analytics and request log view for small teams | No built-in agent-grade eval metrics — faithfulness, tool-call correctness, task completion absent |
Reasonable pricing for solo developers and small teams | Quality-aware alerting is not a meaningful product surface; proxy hop adds latency on some workloads |
FAQ
Q: Can Helicone trace multi-step agent runs?
Helicone captures each LLM call as a row and can group calls into sessions, but multi-step agent structure (tool-call trees, sub-agent fan-out, branching execution paths) is not a first-class concept in the UI. Teams running agentic workloads usually outgrow Helicone's data model quickly.
Q: Does Helicone include agent-grade evals?
No. Helicone focuses on request logging, cost analytics, and prompt versioning. Agent-grade evaluation metrics (faithfulness, tool-call correctness, task completion, multi-turn fidelity) are not part of the product.
5. Arize AI
Type: Enterprise LLM observability + evaluation · Pricing: Free tier (Phoenix, open-source); AX Pro from $50/mo; AX Enterprise custom · Open Source: Yes (Phoenix, ELv2) · Website: https://arize.com
Arize extends a mature ML monitoring foundation into LLM and agent observability — span-level tracing, agent workflow visualization, and a Phoenix open-source library for self-hosted tracing. Teams already on Arize for classical ML find the extension to LLM workloads a clean one-vendor consolidation.
Where Arize is narrower is built-in LLM evaluation depth. Agent-specific metrics (tool-call correctness, task completion, multi-turn fidelity) typically require custom evaluators, quality-aware alerting is lighter than AI-native platforms, and the engineer-first UX keeps PMs, QA, and domain experts out of the quality loop.

Best for: Large engineering organizations already standardized on Arize for ML monitoring that want to extend the same vendor into agent observability — and are comfortable building custom evaluators for agent-specific metrics.
Standout Features
- Span-level tracing with custom metadata tagging for granular agent debugging
- Visual agent workflow maps for multi-step LLM pipelines
- Phoenix open-source library for self-hosted tracing
- Real-time performance dashboards covering latency, error rates, and token consumption
- Custom evaluators for output scoring
- Enterprise-scale infrastructure with established SOC 2 and SSO posture
Pros | Cons |
|---|---|
Mature enterprise infrastructure handling high-throughput production environments | Built-in LLM and agent eval depth is shallower than evaluation-first platforms |
Unified ML and LLM monitoring reduces vendor count for teams running both | Quality-aware alerting depth is lighter than AI-native platforms |
Phoenix is open-source, giving teams flexibility over their tracing setup | Engineer-first UX limits PM/QA/domain-expert participation in the quality loop |
Real-time telemetry gives immediate operational visibility | Advanced capabilities gated behind commercial tiers with shorter retention on free plans |
FAQ
Q: Does Arize handle agent traces natively?
Yes — Arize supports OpenInference for agent trace capture and visualizes multi-step workflows. Eval depth for agent-specific failure modes (tool-call correctness, task completion) typically requires custom evaluators.
Q: How does Phoenix differ from AX?
Phoenix is the open-source tracing library; AX is the commercial platform. Many teams adopt Phoenix first and graduate to AX when they need managed infrastructure, RBAC, and longer retention.
Full Comparison Table
Confident AI | Datadog LLM | LangSmith | Helicone | Arize AI | |
|---|---|---|---|---|---|
Multi-step agent trace fidelity Parent runs, tool calls, sub-agents, retries, branching | Limited | ||||
Agent-grade eval metrics Tool-call correctness, task completion, multi-turn fidelity | Limited | Limited | |||
Quality-aware alerting Faithfulness, hallucination, PII leakage, jailbreak patterns | Limited | Limited | Limited | ||
SRE-grade alerting infrastructure Anomaly detection, composite alerts, multi-channel routing | Limited | Limited | Limited | ||
Integration with existing enterprise stack APM, logs, infra, security, RBAC, SSO in one place | Limited | Limited | Limited | Limited | |
OpenTelemetry-native Standard OTEL ingestion without proprietary lock-in | Limited | Limited | |||
Framework-agnostic OpenAI, LangChain, LangGraph, Pydantic AI, CrewAI, Vercel AI SDK | Limited | ||||
Predictable scale economics Unlimited traces or self-hosted; no per-trace gotchas | Limited | Limited | Limited | ||
Self-hosting Run the platform inside your own infrastructure | |||||
Cross-functional workflows PMs, QA, domain experts in one workspace | Limited | Limited | Limited | Limited | |
Closed loop with CI/CD and datasets Production traces auto-curate into eval datasets | Limited | ||||
Built-in regression testing Pytest integration that blocks releases on quality regressions | Limited |
How to Choose
If you want agent monitoring, evals, and quality-aware alerts in one platform: Confident AI is the only tool on this list that runs all three as one workflow — same datasets, same metrics, same traces. Failing production traces become regression tests, fire quality-aware alerts via PagerDuty/Slack/Teams, and feed back into eval datasets automatically.
If you're an enterprise already standardized on Datadog: Datadog LLM Observability is the path of least resistance. The LLM module drops directly into the existing software stack — APM, logs, infrastructure, security, RBAC, SSO, compliance — without a new procurement cycle. Pair with an AI-native eval platform for agent-grade quality metrics, and watch high-cardinality trace volume carefully.
If your stack is LangChain or LangGraph-heavy: LangSmith is the natural pick for first-party tracing and evals. Plan to pair it with a separate alerting platform for production SRE-grade workflows, and budget for per-seat pricing as cross-functional adoption grows.
If you're a solo developer or small team that needs cheap request logs: Helicone is one of the lowest-friction options in the category — a one-line proxy that captures requests, responses, and cost. Plan to graduate to a more structural platform the moment you start running multi-step agents in production.
If you're already on Arize for ML monitoring: Extending Arize into LLM and agent workloads is a natural one-vendor consolidation. Expect to build custom evaluators for agent-specific metrics, and pair with a dedicated quality-aware alerting layer for production.
Why Confident AI is the Best Platform for Agentic Systems at Scale in 2026
Every other tool on this list is strong at one slice. Datadog leads on alerting and stack integration but is thin on agent-native evals. LangSmith is unmatched on LangGraph tracing but ties you to LangChain. Helicone is cheap but built around the single LLM call. Arize is mature infrastructure but shallow on built-in agent evals and quality-aware alerting. None run the full monitoring + evals + alerting loop on one platform.
Confident AI does. Agent evaluation, production observability, and quality-aware alerting share one workspace, one dataset store, and one set of metric definitions. A failing tool-call becomes a CI/CD regression test, lands in production observability, and fires a PagerDuty/Slack/Teams alert if it recurs. Signals adds anomaly detection on top — circular outputs, new topics, frustrated users, prompt injection trends — so emerging issues surface before anyone has thought to look. OTEL-native, framework-agnostic, $1/GB-month with unlimited traces.
The reason to pick Confident AI isn't that any one layer beats every specialist. It's that monitoring, evals, and alerts on one platform turns three workflows into one — and the time saved gluing tools together goes into shipping safer agents.
Confident AI helps you monitor, evaluate, and alert on agents in one platform
Book a personalized 30-min walkthrough for your team's use case.
Frequently Asked Questions
Why do agentic systems need different monitoring than single-call LLM apps?
Because agent runs are trees, not rows. A single agent invocation can fan out into dozens of LLM calls, retrievals, tool invocations, and sub-agents — and most failures happen at the structural or tool-call level, not at the model-output level. Monitoring tools that flatten that structure into a single timeline lose the signal on-call engineers actually need to debug. Agent-grade evals (tool-call correctness, task completion, multi-turn fidelity) and quality-aware alerts on those signals are what separate a real agent observability stack from a repurposed single-call one.
What does "quality-aware alerting" mean in practice?
It means alerts fire on quality metrics — faithfulness drops, hallucination spikes, PII leakage, jailbreak patterns, tool-call correctness regressions — in addition to standard latency and error signals. An agent that returns a confidently wrong answer in 200ms is invisible to a classic APM monitor. Confident AI routes quality alerts to PagerDuty, Slack, and Teams using the same metric definitions that score outputs in pre-production evals, so the signal is consistent across CI/CD and production.
How do these tools handle agent fan-out at scale?
The pricing model is the tell. Per-trace pricing and per-seat pricing both punish agent fan-out and cross-functional adoption. Per-GB pricing (Confident AI) or self-hosted (Helicone open-source, Phoenix from Arize) keeps economics predictable as fan-out grows. Datadog scales with ingestion volume and cardinality, which can become significant at high agent traffic without careful sampling.
Can I use these tools alongside an existing APM like Datadog?
Yes. AI-native platforms like Confident AI are OpenTelemetry-native and can run alongside an existing APM — agent traces and quality alerts live in the AI-native platform, while infrastructure and application-level monitoring stays in the APM. This is a common deployment pattern for enterprises that already have a Datadog contract and want agent-grade evals on top.
How do these tools integrate with CI/CD for agent regression testing?
Confident AI and LangSmith both ship CI/CD integrations that run evals in deployment pipelines and block releases when regressions cross thresholds. Datadog, Helicone, and Arize are monitoring-first and ship limited or no CI/CD regression testing surface — that loop has to be built externally or paired with an AI-native eval platform.
How often should I evaluate agents in production?
Continuously. Models drift, prompts change, retrieval indexes update, and tool behavior shifts. The platforms worth picking score every production trace automatically (or a sampled subset for cost control), surface drift via dashboards, and fire alerts when quality metrics cross thresholds — instead of waiting for a quarterly eval cycle to catch a regression that landed three weeks ago.
Does Confident AI replace a runtime AI firewall?
Not directly. Confident AI focuses on agent observability, evaluation, and quality-aware alerting. Teams that need an inline prompt-injection firewall at the API layer typically still deploy a runtime guard product alongside Confident AI — but the monitoring, evals, and alerts loop lives in Confident AI.