MiniMax M3 and the Cost of Long Context

MiniMax M3 launches with a sparse attention mechanism that cuts per-token compute at 1M tokens to one-twentieth of its predecessor. The architecture is genuinely interesting; the benchmarks require scrutiny; the license is almost certainly not what the word "open-weight" implies.

Read more →