The 76-Point Serving Backend Lottery

Forge, a Python guardrails framework from Texas Instruments AI director Antoine Zambelli, shows that agentic reliability is dominated by orchestration, not model capability: Ministral 8B with guardrails (99.3%) outperforms Claude Sonnet without them (87.2%). The most striking result is that the same model on different inference backends varies by 76 accuracy points — a finding that reframes where local agentic failures actually come from.

Read more →

When Tools Become Tax

Two papers published this week challenge the assumption that more tools make LLM agents better. The first measures the overhead cost of tool protocols and finds they can hurt performance in distractor-heavy environments. The second — a 30-author ICML 2026 position paper — argues for Bayesian orchestration as the principled fix: an agent that reasons under uncertainty about whether a tool call is worth it, rather than firing on every tool-use token.

Read more →