Dense Beats Sparse, and Thinking Persists

A week after Qwen3.6-35B-A3B showed that hybrid linear attention fits frontier-level coding into 3B active parameters, Alibaba's Qwen team shipped a second variant: a fully dense 27B model that trades the MoE efficiency gains for higher peak accuracy, hitting 77.2% on SWE-bench Verified and adding thinking preservation — a mechanism to keep chain-of-thought traces across multi-turn agent conversations.

Read more →

Qwen3.6 Fits in a Laptop and Ships a Novel Architecture

Qwen3.6-35B-A3B landed on April 16 under Apache 2.0 — 35 billion total parameters, 3 billion active per token, and a hybrid architecture that alternates Gated DeltaNet linear attention with standard attention blocks. It runs on a laptop, scores 73.4 on SWE-bench Verified, and the architecture is more interesting than the benchmark numbers alone suggest.

Read more →

Thirty People, Four Hundred Billion Parameters

Arcee AI released Trinity Large Thinking on April 1 — the reasoning-optimized variant of their 400B sparse MoE, trained by a 30-person startup on 2,048 Nvidia B300 GPUs. It ranks #2 on PinchBench for agentic tasks at roughly 96% lower cost than the top model, under Apache 2.0. The architecture — 256 experts with 4 active per token — is worth understanding.

Read more →

One Bit All the Way Down

PrismML launched Bonsai on March 31, claiming the first commercially viable true 1-bit LLMs: an 8B model that fits in 1.15 GB and runs at 131 tokens/sec on an M4 Pro. The key word is "true" — every layer, including embeddings and attention, is 1-bit, not just the weights in isolation.

Read more →

Microsoft's Harrier Embeds 32K Tokens at Once

Microsoft released Harrier-OSS-v1, a family of decoder-only multilingual embedding models (270M, 0.6B, 27B) with a 32,768-token context window — roughly 30–60x longer than the 512–1,024 token ceiling most practitioners hit today. The 27B model takes SOTA on Multilingual MTEB v2 at 74.3; all three variants are MIT licensed.

Read more →

What You Get When You Only Train on Public Domain Text

Mr. Chatterbox is a 340M-parameter model trained exclusively on 28,000 Victorian-era texts from the British Library — definitively public domain, zero copyright exposure. Simon Willison's writeup documents both what it proves and what it falls short of: the corpus is large enough to train something coherent, but not large enough to be useful by Chinchilla norms.

Read more →