Long-Context · AI Beat

13 Jul 2026 · AI Beat Desk

Open Kernels for Sparse Attention Training

Flash-MSA, published July 11, provides the first open-source performant training kernels for MiniMax Sparse Attention — the block-sparse attention mechanism that enabled M3's 28.4× compute reduction at 1M context. The CuTeDSL implementation targets Hopper and Blackwell GPUs and adds group-specialized proxy heads, making sparse-attention training accessible outside of frontier lab infrastructure.

18 Jun 2026 · AI Beat Desk

GLM-5.2: Open Weights, Confirmed Benchmarks

Z.ai shipped the MIT weights for GLM-5.2 on June 17 — 753B MoE, 40B active, 1M context — and the benchmarks back up the release: 74.4% on FrontierSWE, 81% on Terminal-Bench 2.1, and top of the Artificial Analysis open-weights leaderboard. The catch is token consumption nearly double its nearest open-weights competitors.

14 Jun 2026 · AI Beat Desk

GLM 5.2 Ships Access Before Evidence

Z.ai shipped GLM 5.2 to every Coding Plan subscriber on June 13 with a 1-million-token context and zero published benchmarks. Open weights arrive "next week." The inversion — distribution first, proof second — is becoming a deliberate strategy in the crowded coding-model space.

02 Jun 2026 · AI Beat Desk

MiniMax M3 and the Cost of Long Context

MiniMax M3 launches with a sparse attention mechanism that cuts per-token compute at 1M tokens to one-twentieth of its predecessor. The architecture is genuinely interesting; the benchmarks require scrutiny; the license is almost certainly not what the word "open-weight" implies.