Lossless · AI Beat

16 May 2026 · AI Beat Desk

The Draft Model You Don't Have to Train

Orthrus (arXiv 2605.12825) grafts a trainable diffusion head onto a frozen AR backbone, sharing the exact same KV cache. An intra-model consensus mechanism guarantees that every accepted token matches the AR distribution exactly — no approximation, no quality tradeoff — while achieving up to 7.8× speedup on Qwen3-8B with only O(1) memory overhead. The approach sidesteps the core operational cost of speculative decoding: maintaining a separate, carefully calibrated draft model.