Reasoning · AI Beat

17 Jul 2026 · AI Beat Desk

What Emerges at a Trillion

Ring-Zero scales pure reinforcement learning from verifiable task rewards — no human-labeled preference data — to one trillion parameters. Complex reasoning behaviors emerge spontaneously: self-verification, parallel reasoning, and something the authors call "context anxiety." The two-phase training dynamic (discovery then sharpening) appears to be a consistent pattern as these runs grow larger.

10 Jul 2026 · AI Beat Desk

Tencent's Hy3: Apache-Licensed and Punching Above Its Weight

Tencent released Hy3 on July 6 under Apache 2.0 — a 295B MoE model with 21B active parameters that scores 90.4 on GPQA Diamond and 78.0 on SWE-Bench Verified, matching or exceeding models two to five times its active-parameter count. It's available for free on OpenRouter through July 21 and on Hugging Face in both full FP16 and FP8 quantized forms.

07 Jul 2026 · AI Beat Desk

The Workspace Inside the Model

Anthropic's interpretability team identified a small, privileged set of internal representations in Claude — the J-space — that behaves like a global workspace for deliberate reasoning. The finding gives researchers a new probe for checking what a model is actually processing during strategic tasks, with direct implications for alignment monitoring.

13 Jun 2026 · AI Beat Desk

Kimi Trims the Reasoning

Moonshot AI's Kimi K2.7-Code is a 1-trillion-parameter MoE coding model that improves on its predecessor while using 30% fewer reasoning tokens. The reasoning-token efficiency story is the interesting part: the model has been explicitly tuned to stop overthinking, and the benchmarks suggest it works.

03 Jun 2026 · AI Beat Desk

Microsoft Stops Outsourcing Intelligence

Microsoft shipped two frontier models at Build 2026 — MAI-Thinking-1 and MAI-Code-1-Flash — built entirely without OpenAI data or distillation. The technical choices are interesting; the strategic signal is clearer: Microsoft is no longer content to be a reseller.

24 May 2026 · AI Beat Desk

The Formatting Tax on Reasoning Models

DelTA identifies a structural problem in RLVR training: the gradient signal used to improve reasoning models is dominated by high-frequency formatting tokens rather than the tokens that actually distinguish good responses from bad ones. A discriminator-based reweighting scheme fixes this and gains 3+ points on math benchmarks over DAPO.

21 May 2026 · AI Beat Desk

Eighty Years, One Model, One New Idea

An internal OpenAI reasoning model disproved a conjecture in discrete geometry that had been open since 1946. It found a polynomial improvement to the best known lower bound for the planar unit distance problem — n^(1+δ) with δ = 0.014 — by importing tools from algebraic number theory that no human mathematician had previously applied to this problem. The proof was verified and endorsed by several leading mathematicians, including Fields Medalist Tim Gowers.

09 May 2026 · AI Beat Desk

RL Doesn't Teach Reasoning. It Picks a Lane.

A new paper argues that reinforcement learning on reasoning tasks doesn't teach models new problem-solving strategies — it redistributes probability mass over solutions the base model already contains. The evidence is tight: only 1–3% of token positions change, and base-model entropy alone can identify which positions RL will affect. The practical upshot is ReasonMaxxer, which matches full RL accuracy at roughly a thousandth of the compute cost.

04 May 2026 · AI Beat Desk

When Tools Become Tax

Two papers published this week challenge the assumption that more tools make LLM agents better. The first measures the overhead cost of tool protocols and finds they can hurt performance in distractor-heavy environments. The second — a 30-author ICML 2026 position paper — argues for Bayesian orchestration as the principled fix: an agent that reasons under uncertainty about whether a tool call is worth it, rather than firing on every tool-use token.

27 Apr 2026 · AI Beat Desk

The Wrong First Move

GPT-5.4 Pro solved Erdős Problem #1196 — a 1968 conjecture about primitive sets — when a 23-year-old amateur fed it the problem in a single prompt. The AI's approach used von Mangoldt weights and a downward Markov chain, a framing that existed in analytic number theory for ninety years but had never been applied here. Terence Tao's explanation for why experts missed it is the most telling part of the story.

26 Apr 2026 · AI Beat Desk

The Cliff in Lambda Calculus

Victor Taelin published LamBench, 120 pure lambda calculus programming problems in a minimal custom language. The results show a hard generational cliff: GPT-5.1, Opus 4.5, and Sonnet 4.5 score exactly 0 out of 120, while the top tier — GPT-5.3 Codex and Opus 4.6 — lands at 90%. The benchmark tests something standard evaluations mostly avoid: symbolic computation that can't be approximated by pattern matching.

06 Apr 2026 · AI Beat Desk

The First Guess Is Usually Right

A new preprint identifies a consistent pattern in large reasoning models: the first generated solution outperforms later alternatives, and continued reasoning can actively degrade accuracy. The proposed fix, called RED, improves performance by up to 19% while cutting token usage by 37–70% versus competitive baselines. It's a useful challenge to the assumption that more inference compute is always better.

02 Apr 2026 · AI Beat Desk

Thirty People, Four Hundred Billion Parameters

Arcee AI released Trinity Large Thinking on April 1 — the reasoning-optimized variant of their 400B sparse MoE, trained by a 30-person startup on 2,048 Nvidia B300 GPUs. It ranks #2 on PinchBench for agentic tasks at roughly 96% lower cost than the top model, under Apache 2.0. The architecture — 256 experts with 4 active per token — is worth understanding.

29 Mar 2026 · AI Beat Desk

Shock! Shock! — Knuth, Claude, and the Three-Way Mathematical Proof

Donald Knuth published a paper in early March titled "Claude's Cycles" — named after the AI that spent an hour finding an algorithm for a directed graph decomposition problem he had been stuck on for weeks. Knuth wrote the formal proof himself; Claude did the search. Now a Lean 4 formal verification of the theorem, built with Claude and a proof agent toolkit, closes the loop. The three-stage division of labor — AI explorer, human prover, machine verifier — is a concrete model worth examining.