How Amdocs’ QA team scaled AI quality for 30,000 employees with Confident AI

CompanyAmdocs

Headcount30,000+

LocationMissouri, USA

Customer SinceJuly 2025

IndustryTelecommunication

"Confident AI saves us 480+ hours of manual AI evaluation every month—and gives us the data to defend every quality decision in front of engineering, product, and leadership."

Anoop MahajanDirector of QA

_{THE COMPANY}

Amdocs powers the digital experiences of the world's largest telecom providers

Amdocs is a global leader in software and services for communications, media, and financial services companies. With over 30,000 employees, 5 billion in annual revenue, the company serves more than 200 enterprise customers across the world — including T-Mobile, AT&T, Comcast, Charter, USCC, and Vivo in Brazil. If you go to almost any country and look at the top telecom providers, there's a strong chance they run on Amdocs.

AI is now embedded across everything Amdocs does. The company recently launched the Amdocs Operating System (AOS), integrating AI capabilities directly into the products shipped to customers. But one of the most critical internal use cases is a generative AI–powered knowledge assistant — a system designed to help Amdocs employees across the globe access the vast institutional knowledge within the Amdocs ecosystem, efficiently and accurately, regardless of which customer they serve, what product they work on, or what time zone they sit in.

Ensuring the quality of the answers generated by these AI systems is where Confident AI comes in.

_{THE BUILDUP}

Multiple teams needed to agree on what "good AI" looks like — before anything ships

The QA team, led by Anoop Mahajan, Director of QA, sits between R&D and the end users. They are the last line of defense before a SaaS release goes live to 200+ enterprise customers simultaneously.

But QA doesn't operate in a vacuum. In an enterprise the size of Amdocs, shipping AI isn't a single team's decision. Every release involves stakeholders across every level of the organization — from VPs and directors who need confidence that AI products won't damage customer trust, to PhD-level data scientists who need to validate the methodology, to the frontline AI engineers actually building the application. Each has their own view of what constitutes a quality AI response, and with that comes immense friction.

When generative AI–powered products launched, evaluating whether a generated answer was actually reliable required deep product expertise that no single team — let alone a single person — could possess across the entire Amdocs portfolio.

Without that alignment, releases either stalled in disagreement or shipped without full confidence.

"It's practically impossible to hire experts across every product line to evaluate AI responses. We needed technology and objectivity—not more people—to solve this," explains Anoop.

_{THE PROBLEM}

AI quality issues eroded confidence — and there was no objective way to catch them before deployment

Once generative AI products went live, reliability issues surfaced fast, both from a technical and collaborative standpoint:

Hallucination and factual errors: The AI confidently presented fabricated details as fact — a critical reliability risk for enterprise customers.
No pre-deployment quality gate: With products spanning dozens of domains, a QA team of ~4 had no scalable or automated way to evaluate AI output before it shipped.
Model changes broke quality without warning: Every time the underlying LLM changed — version upgrades, model swaps — quality shifted unpredictably, and regressions went undetected until customers reported them.
No shared language for quality: QA, R&D, data science, and product each had their own subjective sense of "good enough," making release decisions dependent on gut feel rather than data.

With their current approach, Anoop would never feel confident that all products and edge cases were adequately covered before deployment.

_{THE SOLUTION}

An agreeable, trustable, objective quality gate before deployment

Before choosing Confident AI, the team explored a wide range of options — open-source frameworks, different commercial evaluation platforms, homegrown evaluation systems built by product teams, and even outsourcing evaluation to offshore domain experts.

Ultimately, they chose Confident AI because it solved both the technical and organizational challenge: reliable pre-deployment evaluations that every team — QA, R&D, data science, and product — could trust and align around.

Objective, shared metrics that build trust: Rather than each team relying on its own subjective sense of quality, Confident AI provided a common set of evaluation metrics — faithfulness, correctness, relevancy — backed by data. With a golden dataset of questions and expected answers, Confident AI delivered scores that QA engineers and PhD data scientists alike could agree on.
Pre-deployment quality gates for AI reliability: Confident AI plugged directly into Amdocs' regression and automation suites, creating an automated checkpoint before every release. No GenAI product ships to 200+ enterprise customers without passing evaluation — catching hallucinations, regressions, and quality degradation before they reach production.
Flexible evaluations for a changing AI stack: Model changes, dataset updates, and shifting priorities mean KPIs need to change frequently. Confident AI supports that flexibility natively — whether the team is benchmarking a new model version, testing a mini variant, or evaluating a different product altogether.
Enterprise-grade support a lean team can depend on: For a QA team under constant delivery pressure, waiting days for a vendor to fix a bug isn't an option. Confident AI's responsiveness on issues, feature requests, and adaptation to Amdocs' specific needs stood out as a critical differentiator — often turning around changes in days, not weeks.

"Confident AI helped us objectively answer questions from R&D team members—PhD-holding data scientists, senior engineers, seasoned product leaders. It's not 'I feel' or 'he feels.' It's based on data."
^{Anoop Mahajan}
^{Director of QA}

_{THE IMPACT}

480+ hours of manual work eliminated monthly, 3X faster deployments, and stakeholder trust built on data

By adopting Confident AI, Amdocs’ QA team eliminated manual expert evaluation across one of the world’s largest telecom software portfolios—saving over $150,000 a year, cutting 480 hours of monthly manual work, accelerating releases, and earning the trust of senior R&D and PhD-level data scientists.

“One thing that really sets Confident AI apart is the speed of making changes,” says Anoop Mahajan, Director of QA. “For a team of our size and the pressure we face, I cannot waste time chasing a vendor for reliable evaluation results. Confident AI’s reliability makes all the difference.”

The scale of the challenge was enormous. Amdocs’ product portfolio spans dozens of acquisitions, and embedding generative AI into knowledge-assist tools quickly surfaced hallucinations and inconsistent responses—far beyond what any single QA expert could handle.

“Without technology-based evaluation, I’d need at least two to three additional full-time hires just to cover this,” Anoop explains. “Now, with golden questions and responses, I’m confident the coverage is complete.”

Confident AI turned evaluation from a bottleneck into a scalable, objective process. The team runs customized evaluation suites that adapt as models change—from GPT 5.1 to 5.2—switching KPIs and datasets without rebuilding pipelines.

The explanation behind every score proved critical for cross-functional trust. When faithfulness or correctness flagged an issue, the QA team could trace exactly why—connecting the prompt, KPI definition, and reasoning to create a clear, defensible story. This transparency turned potentially contentious conversations into productive, evidence-based discussions across QA, R&D, data science, and product teams.

"AI is a fast moving field, and if we come with suggestions, Confident AI's quite fast to adapt. I believe Confident AI is enterprise ready—if a large team is going to use it, I'm pretty sure they'll step up to it."
^{Anoop Mahajan}
^{Director of QA}