Edge-Inference · AI Beat

07 Jul 2026 · AI Beat Desk

Seven Megabytes of Semantic Search

Ternlight ships a sentence embedding model as a 7MB WASM bundle that runs on CPU in the browser — no API, no model download, no GPU required. Ternary weights are the key to the footprint; the result is semantic search you can include in an npm install.

06 Jun 2026 · AI Beat Desk

Training the Compression In: Gemma 4 QAT for Mobile

Google released quantization-aware training checkpoints for Gemma 4 with a new mobile-specific format — channel-wise quantization aligned with NPU memory layouts, 2-bit compression for token generation layers, pre-calculated scaling constants — bringing the Gemma 4 E2B text model under 1 GB of memory.