Another important dimension is single-turn versus multi-turn. Typically, multi-turn use cases are more expensive to evaluate. Most of the metrics on Confident AI look at the entire conversation for multi-turn, which means the input tokens can be an orders of magnitude large than single-turn evals.
Multi-turn use cases are also harder to evaluate. For a single-turn use case, oftentimes an input and expected output are all you need for datasets. For multi-turn, however, what you need are scenarios and simulations in order to first curate that conversation so you actually have a conversation to evaluate.
The good news is that tracing really isn't that different for single-turn versus multi-turn. On our platform, all you have to do is provide a unique ID, and we'll simply group together different traces through what we call a thread.
Are inter-agent communications considered multi-turn?
A common misconception with multi-turn use cases is agents talking to different agents. Just because an agent is communicating with an external system, which could also be a swarm of agents or just another agent itself, doesn't mean that's multi-turn. Multi-turn, in our definition, is about the observable I/Os of your system. By observable, I mean observable to the user. This also means that multi-turn use cases are very often user-facing applications.
For example, let's say a user asks a question to a sales agent, the sales agent kickstarts another agent, they talk to one another, and it gives the final output after 30 seconds of internal interaction. This is a multi-turn use case because it talks to the user, and the user can see the final output. What the user doesn't see is the internal communication between the agents. If we were to remove the user from the equation and make this a form-filling interaction instead, where a user would fill a form and let the sales agent run and return something, this would be a single-turn use case because this isn't something that the user sees, and internal communication does not constitute an observable turn.