One Bit All the Way Down
PrismML launched Bonsai on March 31, claiming the first commercially viable true 1-bit LLMs: an 8B model that fits in 1.15 GB and runs at 131 tokens/sec on an M4 Pro. The key word is "true" — every layer, including embeddings and attention, is 1-bit, not just the weights in isolation.
Read more →
