A new preprint identifies a consistent pattern in large reasoning models: the first generated solution outperforms later alternatives, and continued reasoning can actively degrade accuracy. The proposed fix, called RED, improves performance by up to 19% while cutting token usage by 37–70% versus competitive baselines. It's a useful challenge to the assumption that more inference compute is always better.
Arcee AI released Trinity Large Thinking on April 1 — the reasoning-optimized variant of their 400B sparse MoE, trained by a 30-person startup on 2,048 Nvidia B300 GPUs. It ranks #2 on PinchBench for agentic tasks at roughly 96% lower cost than the top model, under Apache 2.0. The architecture — 256 experts with 4 active per token — is worth understanding.
Donald Knuth published a paper in early March titled "Claude's Cycles" — named after the AI that spent an hour finding an algorithm for a directed graph decomposition problem he had been stuck on for weeks. Knuth wrote the formal proof himself; Claude did the search. Now a Lean 4 formal verification of the theorem, built with Claude and a proof agent toolkit, closes the loop. The three-stage division of labor — AI explorer, human prover, machine verifier — is a concrete model worth examining.