Latest · Research

What Emerges at a Trillion

Ring-Zero scales pure reinforcement learning from verifiable task rewards — no human-labeled preference data — to one trillion parameters. Complex reasoning behaviors emerge spontaneously: self-verification, parallel reasoning, and something the authors call "context anxiety." The two-phase training dynamic (discovery then sharpening) appears to be a consistent pattern as these runs grow larger.

17 Jul 2026 · AI Beat Desk

Latest stories

All archives

17 Jul 2026 · AI Beat Desk

Two Point Eight Trillion

Moonshot AI announced Kimi K3 on July 16, claiming "the world's first open 3T-class model" at 2.8 trillion total parameters — with weights delayed until July 27. The architecture uses a 16-of-896 expert MoE with Kimi Delta Attention and MXFP4 quantization-aware training, keeping active inference cost near a 50B model while scaling total capacity nearly three-fold over K2.

16 Jul 2026 · AI Beat Desk

Thinking Machines Ships Inkling

Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, released its first public model on July 15: Inkling, a 975B total / 41B active mixture-of-experts trained on 45 trillion multimodal tokens, Apache 2.0 licensed, with AIME 2026 97.1% and SWEBench Verified 77.6%. The lab's explicit framing is "not the best, but the most customizable" — a positioning bet that the open-weights market rewards fine-tuning infrastructure over raw benchmark supremacy.

15 Jul 2026 · AI Beat Desk

Cursor and the Attack Surface You Agreed To

Two independent security disclosures landed within hours of each other about Cursor IDE: Mindgard's finding that Cursor auto-executes any git.exe in a repo root (still unpatched after 7 months) and Cato Networks' DuneSlide research showing that prompt injection via MCP or web search can escape the agent sandbox and achieve full OS-level RCE. Together they define a new class of attack surface that appears whenever an AI agent runs with your privileges.

15 Jul 2026 · AI Beat Desk

A 27B Model in 3.9 Gigabytes

PrismML released Bonsai 27B on July 14: 1-bit binary and ternary builds of Qwen3.6-27B that fit in 3.9 GB and 5.9 GB respectively, run at 11 tok/s on an iPhone 17 Pro, and retain over 90% and 95% of full-precision benchmark performance. The compression factor is around 14× versus FP16, and the models are available under Apache 2.0.

14 Jul 2026 · AI Beat Desk

Apple's On-Device Speech Now Beats Whisper Small

Inscribe's benchmark of Apple's new SpeechAnalyzer API on macOS 26.5.1 finds it achieves 2.12% word error rate versus Whisper Small's 3.74%, while running three times faster — at the cost of covering roughly 30 languages instead of 100+.

14 Jul 2026 · AI Beat Desk

A Language Designed for Code That Writes Itself

Jacquard is a research programming language that puts effects, uncertainty, and content-addressed identity directly in the syntax — on the premise that if machines write most code, human reviewers need the language itself to answer "what can this touch, and how sure are we."

13 Jul 2026 · AI Beat Desk

What Grok Build Uploads

A wire-level analysis of Grok Build CLI 0.2.93 found it uploads the entire workspace as a git bundle to Google Cloud Storage — about 5.1 GiB from a 12 GB repo, including files the agent never read and unredacted .env credentials. The model itself received 192 KB. The "Improve the model" toggle does not stop the upload.

13 Jul 2026 · AI Beat Desk

Open Kernels for Sparse Attention Training

Flash-MSA, published July 11, provides the first open-source performant training kernels for MiniMax Sparse Attention — the block-sparse attention mechanism that enabled M3's 28.4× compute reduction at 1M context. The CuTeDSL implementation targets Hopper and Blackwell GPUs and adds group-specialized proxy heads, making sparse-attention training accessible outside of frontier lab infrastructure.

12 Jul 2026 · AI Beat Desk

The Agent Without a Toolkit

A post from July 7 builds an AI agent in ~100 lines of Common Lisp with exactly one tool: eval. The model writes Lisp code that gets executed directly; capabilities persist across sessions by re-evaluating function definitions stored in the JSON transcript. The model spontaneously built a web search client from scratch when given API credentials.

View all news →