Training the Compression In: Gemma 4 QAT for Mobile
Google released quantization-aware training checkpoints for Gemma 4 with a new mobile-specific format — channel-wise quantization aligned with NPU memory layouts, 2-bit compression for token generation layers, pre-calculated scaling constants — bringing the Gemma 4 E2B text model under 1 GB of memory.
Read more →
