We Need to Talk. In Code.
TGIF! Thank god it’s features, here’s what we shipped this week:
Big week for the org-anized among us. Multi-turn evals go code-first, Vercel joins the family, and prompts finally get the observability they deserve.

Added
- Code-Based Multi-Turn Evals - Introducing
ConversationalTestCasefor your codebase. All the power of multi-turn evaluation, now programmable. Time to have the talk with your chatbot—in code. - Vercel AI SDK Integration - Next.js devs, rejoice! Native integration with Vercel’s AI SDK means you can trace and evaluate your
aipackage calls with zero friction. Ship fast, eval faster. - Transformers on Retrievers & Tools - Transformers aren’t just for AI connection outputs anymore. Reshape retriever outputs and tool calls before evaluation. Your agentic RAG pipeline called—it wants its custom parsing back.
- Organization-Wide Metrics - Define metrics at the org level and share them across all your teams. No more “wait, which faithfulness config are we using?” Standardize once, evaluate everywhere.
Changed
- Prompt Observability - Track which prompts are running in production, when they were swapped, and how performance changed. Finally, prompt feedback on your prompts.