Gemma 4 Gets Speculative Decoding That Ships

Google ships multi-token prediction draft models for the full Gemma 4 family under Apache 2.0, reporting up to 3x throughput gains. The architecture is tightly coupled — shared embeddings, last-layer activations — which keeps the drafter accurate but limits reuse. MoE variants complicate the picture.

Read more →