User-Facing vs. Non-User-Facing Apps

When it comes to AI applications, especially from the evaluation standpoint, there are several dimensions, and one of the most important dimensions is whether your app is user-facing or not. For example, a user-facing app would have a much higher bar for safety metrics such as bias compared to an internal document summarizer.

The stakes are also different for user-facing apps versus non-user-facing apps. We're not saying that one is more important than the other. For example, not being able to extract the correct fields from a massive PDF, which leads to a lot of manual intervention, could cost a shipping freight company billions and billions of dollars every single year, and it doesn't even necessarily have to be user-facing. It's just that the criteria for evaluation are different. A correctness metric that uses exact matching is much more important for document extraction compared to something like a sales BDR agent. Here is the list of some of the main differences between the two:

User-facing: intent, sentiment, abandonment, repeat questions. These are patterns that only show up when a person is on the other end. Teams that skip these signals on a patient intake assistant or a financial advisor agent learn about quality breakdowns from complaints instead of early detection.
Non-user-facing: operational failure modes, weird outputs, bad inputs. There is no user frustration to detect. If you wire user-intent classifiers to a recruiting screener that filters resumes, you are solving a problem you do not have.

While non-user facing apps can have the same monetary consequences as user-facing apps, user-facing apps usually have reputational consequences that might lead to irreversible damage with their user/customer base.