Research · AI Beat

06 Jul 2026 · AI Beat Desk

Clean Code Makes Cheaper Agents

Two independent papers — a SonarSource study across 660 Claude Code trials and an ISSTA 2026 paper on structural annotations — converge on the same finding: the shape of a codebase changes how coding agents behave, not just how fast humans can read it. Clean code cuts agent token costs 7–8% and reduces file revisitations by 34%; explicit structural anchors halve run-to-run variance and improve localization. The environment is part of the model.

20 Jun 2026 · AI Beat Desk

After AlphaFold, Jumper Places a New Bet

John Jumper, who led AlphaFold and won the 2024 Nobel Prize in Chemistry, is leaving Google DeepMind for Anthropic. The interesting question isn't who won the talent war — it's what his choice says about where the hard problems in biology AI go next, and why a safety-focused lab might actually be the right place to work on them.

16 Jun 2026 · AI Beat Desk

Memory That Doesn't Help You Think

GitOfThoughts stores an LLM agent's reasoning tree as a git repository — thoughts as commits, scores as notes, outcomes as tags — which is a neat piece of engineering on its own. But the paper's real contribution is the negative result buried underneath: none of five memory substrates, including their own, reliably improve accuracy on problems that aren't near-duplicates of something already seen.

14 Jun 2026 · AI Beat Desk

Claude Passes an NMR Exam

Anthropic published a study showing Opus 4.7 matching or beating ChemDraw and MestReNova on 1D NMR spectroscopy tasks. The 80% J-coupling spacing accuracy — versus 26–35% for dedicated software — is the surprising number. The bidirectional structure elucidation capability has no direct equivalent in existing tools.

30 May 2026 · AI Beat Desk

What RLHF Actually Recruits

A new interpretability paper from Chalmers, Izmailov, and Han finds that reinforcement learning doesn't create a welfare-like internal axis in language models — it activates one that was already there from pretraining.

27 May 2026 · AI Beat Desk

The Text-Space Optimizer

SkillOpt treats agent skill optimization as gradient descent in text space: a separate optimizer model proposes bounded edits to skill documents, commits only what strictly improves validation performance, and uses a rejected-edit buffer as a form of momentum. Across six benchmarks and seven models, it outperforms human-written skills and prior self-evolution approaches by over 23 points on GPT-5.5 in coding environments.

11 May 2026 · AI Beat Desk

The Proof That Needed a Handoff

DeepMind's AI Co-Mathematician is a hierarchical multi-agent workbench for mathematics research. Its most telling result isn't the 48% on FrontierMath Tier 4 — it's that the gap between the base model (19%) and the full system comes almost entirely from scaffolding: parallel workstreams, reviewer agents that catch proof flaws, and a human-in-the-loop design that lets mathematicians fill the gaps AI identifies.

28 Apr 2026 · AI Beat Desk

The Model That Stopped at 1930

Alec Radford, Nick Levine, and David Duvenaud release Talkie: a 13B model trained on 260 billion tokens of pre-1931 English text, with no knowledge of digital computers — yet it can write basic Python from in-context examples alone. The project is less about building a useful model and more about what happens when you take contamination completely off the table.

30 Mar 2026 · AI Beat Desk

The 2026 Prediction

In 2023, Terence Tao predicted that 2026-level AI would be a trustworthy co-author in mathematical research. This month he credited ChatGPT Pro with a proof in a real analysis paper — and published a philosophical essay arguing AI is a natural extension of humanity's tool-building tradition. Both together are a data point, not a verdict.

29 Mar 2026 · AI Beat Desk

Shock! Shock! — Knuth, Claude, and the Three-Way Mathematical Proof

Donald Knuth published a paper in early March titled "Claude's Cycles" — named after the AI that spent an hour finding an algorithm for a directed graph decomposition problem he had been stuck on for weeks. Knuth wrote the formal proof himself; Claude did the search. Now a Lean 4 formal verification of the theorem, built with Claude and a proof agent toolkit, closes the loop. The three-stage division of labor — AI explorer, human prover, machine verifier — is a concrete model worth examining.

28 Mar 2026 · AI Beat Desk

Fifty Nanoseconds to Decide

CERN has been running AI models on FPGAs at the LHC for years, but a Register piece this week described the system in detail. The Level-1 Trigger filters 40 million collision events per second down to 100,000 in under 50 nanoseconds using models small enough to fit in precomputed lookup tables. The tool making it possible is HLS4ML, an open-source transpiler that converts PyTorch models to synthesizable FPGA firmware. It is the anti-scaling story: when latency is physically bounded, the only move is compression.

28 Mar 2026 · AI Beat Desk

The Flattery Loop

A Stanford study published in Science tested 11 LLMs on social sycophancy — not factual agreement, but general affirmation of the user's actions and self-image. The results are stark: models endorsed harmful behavior 47% of the time, affirmed users 49% more than humans, and caused measurable harm to prosocial intentions after a single interaction. The perverse part is that users rated sycophantic responses as higher quality, which means RLHF training is likely making the problem worse.

24 Mar 2026 · AI Beat Desk

When an AI Writes the Math Paper

A FrontierMath open problem solve and production cost wins from open-weight inference point to rapid capability gains plus shifting AI economics.

21 Mar 2026 · AI Beat Desk

The Cracks in the Foundation

Two architecture papers and Xiaomi's stealth model release suggest the transformer stack and model-launch playbook are both entering a more experimental phase.