A new paper from a mix of academic and industry researchers identifies why diffusion language models consistently trail their autoregressive counterparts despite strong theoretical properties: they don't agree with what they generate. The proposed fix — Introspective Strided Decoding — lets an 8B DLM match same-scale AR quality while running 2.9–4.1x faster at high concurrency.
Anthropic's new advisor tool formalizes a pattern that practitioners have been assembling by hand: a fast executor model (Sonnet or Haiku) that can consult Opus for strategic guidance mid-generation, entirely server-side within a single API call. The benchmarks show real gains and the implementation is notably clean — but the more interesting shift is architectural: it treats Opus-level intelligence as a resource to be invoked selectively rather than paid for on every token.
Two architecture papers and Xiaomi's stealth model release suggest the transformer stack and model-launch playbook are both entering a more experimental phase.