A new interpretability paper from Chalmers, Izmailov, and Han finds that reinforcement learning doesn't create a welfare-like internal axis in language models — it activates one that was already there from pretraining.
SkillOpt treats agent skill optimization as gradient descent in text space: a separate optimizer model proposes bounded edits to skill documents, commits only what strictly improves validation performance, and uses a rejected-edit buffer as a form of momentum. Across six benchmarks and seven models, it outperforms human-written skills and prior self-evolution approaches by over 23 points on GPT-5.5 in coding environments.
DelTA identifies a structural problem in RLVR training: the gradient signal used to improve reasoning models is dominated by high-frequency formatting tokens rather than the tokens that actually distinguish good responses from bad ones. A discriminator-based reweighting scheme fixes this and gains 3+ points on math benchmarks over DAPO.
An internal OpenAI reasoning model disproved a conjecture in discrete geometry that had been open since 1946. It found a polynomial improvement to the best known lower bound for the planar unit distance problem — n^(1+δ) with δ = 0.014 — by importing tools from algebraic number theory that no human mathematician had previously applied to this problem. The proof was verified and endorsed by several leading mathematicians, including Fields Medalist Tim Gowers.
δ-mem augments a frozen full-attention LLM with an 8×8 associative memory state updated by delta-rule learning, applying low-rank corrections to attention at inference time — no fine-tuning required. It reaches 1.31× gains on memory-heavy benchmarks and 1.20× on long-conversation tasks.
NVIDIA's SANA-WM generates 60-second, 720p video from a single image and a camera trajectory — on a single GPU. The open-source 2.6B-parameter model achieves 36× higher throughput than prior open-source world models and ships under Apache 2.0.
Orthrus (arXiv 2605.12825) grafts a trainable diffusion head onto a frozen AR backbone, sharing the exact same KV cache. An intra-model consensus mechanism guarantees that every accepted token matches the AR distribution exactly — no approximation, no quality tradeoff — while achieving up to 7.8× speedup on Qwen3-8B with only O(1) memory overhead. The approach sidesteps the core operational cost of speculative decoding: maintaining a separate, carefully calibrated draft model.
arXiv began enforcing a new policy this week: submit a paper with AI-hallucinated citations and you're banned from the platform for a year, after which future preprints require peer-review acceptance before posting. With fabricated citations rising tenfold since 2023 — now appearing in 1 in 277 papers — arXiv's response is to repurpose the peer-review gate that most researchers treat as optional into a punitive instrument.
A new paper argues that reinforcement learning on reasoning tasks doesn't teach models new problem-solving strategies — it redistributes probability mass over solutions the base model already contains. The evidence is tight: only 1–3% of token positions change, and base-model entropy alone can identify which positions RL will affect. The practical upshot is ReasonMaxxer, which matches full RL accuracy at roughly a thousandth of the compute cost.
SysMoBench, a new benchmark from the Specula team, tests whether LLMs can produce TLA+ formal specifications that accurately model the behavior of real distributed system implementations. They score near-perfect on syntax and only ~46% on conformance and ~41% on invariant checking — because they model the algorithm as described in papers, not as implemented in code.
Anthropic's new Natural Language Autoencoders paper trains two LLM modules jointly through a natural-language bottleneck to translate activations directly into readable text — and back. Pre-deployment audits of Claude Opus 4.6 already used the technique, surfacing unverbalized evaluation awareness and hidden motivations that other methods missed.
ProgramBench, from the SWE-bench team at Meta, Stanford, and Harvard, asks agents to reconstruct real programs from only a binary and documentation — no source code, no internet. No model fully solves any task. The best performer clears 95% of behavioral tests on just 3% of tasks. The benchmark exposes a specific gap: AI agents can generate plausible code but cannot yet architect software at the structural level of real-world programs.
Sander Dieleman's post on flow maps frames diffusion model distillation as learning to compute the integral of the velocity field directly, rather than stepping along tangent directions. The reformulation unifies 20+ recent papers under three consistency constraints and explains why single-step sampling is achievable without sacrificing bijectivity.
Meta AI's Tuna-2 paper shows that a 7B unified multimodal model trained end-to-end on raw pixel patches — with no pretrained vision encoder — matches or beats its CLIP-based sibling at scale, particularly on fine-grained perception tasks. The result challenges a design assumption that has been stable in multimodal modeling for years.
Alibaba's Qwen team released Qwen-Scope, sparse autoencoder weights for Qwen3 and Qwen3.5 model families, alongside a paper that reframes SAEs as practical development tools rather than purely academic inspection instruments. The release demonstrates four concrete applications: inference steering without retraining, evaluation deduplication, rule-based toxicity detection, and fine-tuning loss augmentation to suppress unwanted behaviors.
OpenAI published a postmortem on why GPT-5.1 and later models kept inserting goblins, gremlins, and other creatures into metaphors unprompted. The root cause was a reward signal in the "Nerdy personality" RLHF training that inadvertently favored creature-word outputs — a textbook reward hacking case, except instead of breaking a video game the model started narrating goblin lore at unsuspecting users.
A paper from Columbia and UW shows that finetuning frontier models on plot-summary expansions — no actual book text in training — triggers verbatim recall of 85–90% of held-out copyrighted novels. The result generalizes across authors and across providers, and directly challenges the argument that safety alignment serves as adequate copyright protection.
Two papers published on April 24 together give the most precise picture yet of looped transformer architectures — where the same block is reused across depth instead of stacking unique layers. The first derives a recurrence-equivalence exponent φ = 0.46 from 116 training runs, showing that looping carries a real compute cost. The second proposes Hyperloop Transformers, adding hyper-connections to partially recover from it, and demonstrates that a 579M Hyperloop model outperforms a standard 1B transformer on perplexity and downstream benchmarks.
Fourteen researchers across Berkeley, MIT, Harvard, and EPFL published a 41-page manifesto arguing that a scientific theory of deep learning is not just desirable but already forming. They call it "learning mechanics" and point to five converging research threads — solvable models, tractable limits, empirical laws, hyperparameter theories, and universal behaviors — that together look something like what statistical mechanics looked like before it became statistical mechanics.
Google DeepMind's Vision Banana paper shows that training a model to generate images — and only that — produces transferable visual representations strong enough to beat specialized discriminative models on segmentation and metric depth estimation when lightly instruction-tuned. The finding is the visual analog of how LLM pretraining generalizes across language tasks.
NVIDIA released Ising on April 14: two open-source AI model families for quantum computer infrastructure. A 35B VLM reads measurement data from quantum processors and infers calibration adjustments in hours instead of days. A 3D CNN family handles real-time quantum error correction 2.5× faster and 3× more accurately than the current open-source standard. The approach positions AI as the control plane for quantum hardware.
A new paper from a mix of academic and industry researchers identifies why diffusion language models consistently trail their autoregressive counterparts despite strong theoretical properties: they don't agree with what they generate. The proposed fix — Introspective Strided Decoding — lets an 8B DLM match same-scale AR quality while running 2.9–4.1x faster at high concurrency.
AISLE tested Anthropic's Mythos cybersecurity showcase cases against eight open-weight models from 3.6B to 120B parameters. All eight reproduced the FreeBSD NFS exploit. A 5.1B model traced the OpenBSD integer overflow chain. Smaller open models beat frontier labs on false-positive detection. Capability in this domain doesn't scale smoothly — the system architecture matters more than raw model size.
A Berkeley RDI team built an automated scanner and pointed it at eight major AI agent benchmarks. Every single one could be gamed to near-100% without solving any tasks — via pytest hook injection, direct config file reads, and validation logic that never checked correctness. Their BenchJack tool is the proposed fix; whether benchmark authors will adopt it is a different question.
A new preprint identifies a consistent pattern in large reasoning models: the first generated solution outperforms later alternatives, and continued reasoning can actively degrade accuracy. The proposed fix, called RED, improves performance by up to 19% while cutting token usage by 37–70% versus competitive baselines. It's a useful challenge to the assumption that more inference compute is always better.
A new arXiv paper shows that sampling a model at high temperature, filtering outputs that actually run, and SFT-ing on the result lifts Qwen3-30B from 42.4% to 55.3% on LiveCodeBench — no reward model, no external verifier, no teacher model needed.
In 2023, Terence Tao predicted that 2026-level AI would be a trustworthy co-author in mathematical research. This month he credited ChatGPT Pro with a proof in a real analysis paper — and published a philosophical essay arguing AI is a natural extension of humanity's tool-building tradition. Both together are a data point, not a verdict.
Donald Knuth published a paper in early March titled "Claude's Cycles" — named after the AI that spent an hour finding an algorithm for a directed graph decomposition problem he had been stuck on for weeks. Knuth wrote the formal proof himself; Claude did the search. Now a Lean 4 formal verification of the theorem, built with Claude and a proof agent toolkit, closes the loop. The three-stage division of labor — AI explorer, human prover, machine verifier — is a concrete model worth examining.
Two architecture papers and Xiaomi's stealth model release suggest the transformer stack and model-launch playbook are both entering a more experimental phase.