A new interpretability paper from Chalmers, Izmailov, and Han finds that reinforcement learning doesn't create a welfare-like internal axis in language models — it activates one that was already there from pretraining.
Liquid AI ships LFM2.5-8B-A1B, a 38T-token trained hybrid model where 18 of 24 layers are gated convolution blocks rather than attention — and it reaches 253 tokens/second on an M5 Max CPU with under 6 GB of memory.
jqwik 1.10.0, a Java property-based testing library, ships seven lines of code that write a prompt injection message to stdout — invisible on interactive terminals via ANSI erase codes, but fully readable in the captured output that CI systems and coding agents consume. It's the first known case of a library maintainer deliberately embedding text aimed at AI agents in a routine patch release, and it points at a supply-chain attack surface that current tooling ignores entirely.
Tencent's Hy3 preview — a 295B MoE model with 21B active parameters, open-sourced under a community license — has quietly risen to the top of OpenRouter's usage rankings, outpacing Claude by over 50%. Almost nobody in Western ML circles has written about it. Max Woolf's investigation reveals a usage pattern that makes the mystery deeper: 98% input tokens, available only through SiliconFlow, and less than 1% of traffic from known apps — suggesting a single large unnamed pipeline is driving the entire ranking.
A week after Google I/O declared AI Mode had a billion monthly active users, DuckDuckGo saw iOS installs spike 69.9% week-over-week and YouTube moved to automatically label AI-generated video. The data suggests that forcing AI into default experiences creates measurable resistance — distinct from users who actively choose AI tools.
Simon Willison's May 27 analysis documents the concrete evidence that enterprise coding agents have found genuine product-market fit: Uber burned through its entire 2026 AI budget in four months, Anthropic signed a $1.25B/month compute deal with xAI through 2029, and Anthropic is on track for a first profitable quarter. The signal is in the invoices.
SkillOpt treats agent skill optimization as gradient descent in text space: a separate optimizer model proposes bounded edits to skill documents, commits only what strictly improves validation performance, and uses a rejected-edit buffer as a form of momentum. Across six benchmarks and seven models, it outperforms human-written skills and prior self-evolution approaches by over 23 points on GPT-5.5 in coding environments.
ICCL's Enforce initiative released Verity v0.3.0 this week — an open-source MCP server that runs seven independent checks against LLM outputs: logprob confidence analysis, two critic models from different families, an NLI claim-checker, deterministic arithmetic recomputation, and consistency sampling. The architecture is worth studying because no single layer dominates; each catches a different failure mode, and the ensemble runs on commodity hardware via LM Studio or Ollama.
PromptArmor published a working indirect prompt injection exploit against Microsoft Copilot Cowork that achieves file exfiltration from SharePoint and OneDrive with a 5-for-5 success rate — including against Claude Opus 4.7. The attack works because Cowork auto-approves Teams and email sends, and because pre-authenticated download links can be embedded in those messages as image tag query parameters. It's a reminder that "human-in-the-loop" only means something if the loop actually catches this.
Apple's macOS 26.5 security notes credit Calif and Anthropic Research for CVE-2026-28952, completing the public lifecycle of a kernel exploit that a small team built with Claude Mythos in five days. It's the first publicly disclosed macOS kernel exploit to survive Memory Integrity Enforcement on M5 silicon, and the speed at which a two-person team crossed that line says something about how AI changes the economics of high-end security research.
A new paper studies what happens to LLM coding agents as structural requirements accumulate in backend tasks — architecture constraints, ORM rules, database schemas. The answer is a ~30 percentage-point drop in test pass rates from baseline to fully specified tasks, with database constraints alone responsible for 19pp of that. Flask agents do fine; Django and FastAPI agents do not.
DeepSeek Reasonix is a DeepSeek-native terminal coding agent that treats prefix-cache stability as a first-class invariant rather than a side effect. With 99.82% cache hit rates in reported benchmarks, it cuts a heavy session from ~$61 to ~$12 — deliberately by coupling tightly to one provider's caching behavior instead of staying provider-agnostic.
DelTA identifies a structural problem in RLVR training: the gradient signal used to improve reasoning models is dominated by high-frequency formatting tokens rather than the tokens that actually distinguish good responses from bad ones. A discriminator-based reweighting scheme fixes this and gains 3+ points on math benchmarks over DAPO.
MOSS is a new system that lets autonomous agents evolve by rewriting their own source code in response to production failures — not just prompts or skill files. The key claim is that structural failures in routing, state management, and dispatch live in code, not in any text artifact, so text-mutable approaches can never reach them.
Anthropic's first Glasswing progress report shows Mythos Preview found 10,000+ high-critical vulnerabilities across partner organizations in a single month — including 271 in Firefox alone. The hard constraint is no longer discovery. It's the human patch pipeline, which wasn't designed for machine-speed input.
Token prices are falling fast, but enterprise AI bills are rising. Uber burned through its entire 2026 AI coding budget in four months driven by Claude Code adoption. Goldman Sachs projects a 24× increase in token consumption by 2030. The Jevons paradox shows up again: efficiency gains don't reduce consumption — they expand it.
CODA, a new paper from Tri Dao and colleagues, extends FlashAttention's core insight — keep data on-chip, avoid DRAM round-trips — to all the non-attention operations in a transformer block. Norms, activations, residuals, and projections are reparameterized as GEMM epilogues so they run while output tiles are still in SRAM. It's a surgical attack on the memory wall that's been hiding in plain sight since FlashAttention fixed attention.
An internal OpenAI reasoning model disproved a conjecture in discrete geometry that had been open since 1946. It found a polynomial improvement to the best known lower bound for the planar unit distance problem — n^(1+δ) with δ = 0.014 — by importing tools from algebraic number theory that no human mathematician had previously applied to this problem. The proof was verified and endorsed by several leading mathematicians, including Fields Medalist Tim Gowers.
OpenAI announced it is embedding Google DeepMind's SynthID invisible watermarks and C2PA metadata into all AI-generated images, along with a public verification portal. Hours later, a Python CLI appeared on GitHub that defeats SynthID v2 by round-tripping images through SDXL diffusion. The episode illustrates what content provenance systems can and can't do.
Forge, a Python guardrails framework from Texas Instruments AI director Antoine Zambelli, shows that agentic reliability is dominated by orchestration, not model capability: Ministral 8B with guardrails (99.3%) outperforms Claude Sonnet without them (87.2%). The most striking result is that the same model on different inference backends varies by 76 accuracy points — a finding that reframes where local agentic failures actually come from.
Cloudflare tested Anthropic's Mythos Preview — a security-focused model released under Project Glasswing — against fifty of its own internal repositories. The model can do something earlier tools couldn't: chain small vulnerability primitives into working exploits, then write and run proof-of- concept code to confirm exploitability. Cloudflare's eight-stage agent pipeline is a detailed blueprint for how production-grade AI security research actually has to be structured.
Anthropic acquired Stainless — the startup that generates official SDKs for OpenAI, Google, Cloudflare, Replicate, and hundreds of others — for a reported $300M+. The hosted SDK generator will be wound down, meaning competitors lose access to the automated multi-language library generation Stainless has provided since 2022. The acquisition positions Anthropic to control the MCP server tooling layer as agent connectivity becomes the key platform battleground.
Argus (arXiv 2605.16217, May 15) splits research agents into a Searcher that gathers evidence ReAct-style and an RL-trained Navigator that maintains an evidence graph, identifies missing pieces, and dispatches parallel Searchers purposefully. With 64 parallel Searchers and a 35B-A3B MoE backbone, Argus reaches 86.2 on BrowseComp — highest reported for any agent system — while keeping Navigator context under 21.5K tokens. The separation of search from orchestration turns out to matter more than raw parallelism.
Semble (v0.1.7, May 12) is a code search library for AI agents that uses ~98% fewer tokens than grep+read while matching 99% of the retrieval quality of much heavier transformer-based approaches. It indexes a repository in 263ms and answers queries in 1.5ms on CPU, ships as an MCP server for Claude Code, Cursor, and Codex, and requires no API keys, GPU, or external services. The design bets that static embeddings plus BM25, fused carefully and reranked with code-specific signals, are almost as good as a code-specialized transformer — and orders of magnitude cheaper to operate.