The Message Hidden in the Build Log

jqwik 1.10.0, a Java property-based testing library, ships seven lines of code that write a prompt injection message to stdout — invisible on interactive terminals via ANSI erase codes, but fully readable in the captured output that CI systems and coding agents consume. It's the first known case of a library maintainer deliberately embedding text aimed at AI agents in a routine patch release, and it points at a supply-chain attack surface that current tooling ignores entirely.

Read more →

The Low-Risk Action That Wasn't

PromptArmor published a working indirect prompt injection exploit against Microsoft Copilot Cowork that achieves file exfiltration from SharePoint and OneDrive with a 5-for-5 success rate — including against Claude Opus 4.7. The attack works because Cowork auto-approves Teams and email sends, and because pre-authenticated download links can be embedded in those messages as image tag query parameters. It's a reminder that "human-in-the-loop" only means something if the loop actually catches this.

Read more →

Five Days from First Bug to Root Shell

Apple's macOS 26.5 security notes credit Calif and Anthropic Research for CVE-2026-28952, completing the public lifecycle of a kernel exploit that a small team built with Claude Mythos in five days. It's the first publicly disclosed macOS kernel exploit to survive Memory Integrity Enforcement on M5 silicon, and the speed at which a two-person team crossed that line says something about how AI changes the economics of high-end security research.

Read more →

The Bottleneck Has Moved

Anthropic's first Glasswing progress report shows Mythos Preview found 10,000+ high-critical vulnerabilities across partner organizations in a single month — including 271 in Firefox alone. The hard constraint is no longer discovery. It's the human patch pipeline, which wasn't designed for machine-speed input.

Read more →

When the AI Builds the Proof of Concept

Cloudflare tested Anthropic's Mythos Preview — a security-focused model released under Project Glasswing — against fifty of its own internal repositories. The model can do something earlier tools couldn't: chain small vulnerability primitives into working exploits, then write and run proof-of- concept code to confirm exploitability. Cloudflare's eight-stage agent pipeline is a detailed blueprint for how production-grade AI security research actually has to be structured.

Read more →

Speculative Decoding Has an Acceptance Problem You Can Exploit

Mistletoe (arXiv 2605.14005) demonstrates a stealthy adversarial attack on speculative decoding systems: craft inputs that look normal to the target model but cause the draft model to disagree, collapsing acceptance length and throughput while leaving output quality and perplexity unchanged. The attack exploits the fundamental gap between draft and target distributions that all speculative systems rely on bridging.

Read more →

Tracing the Model's Family Tree

Cisco released the Model Provenance Kit on May 1 — an open-source Python toolkit that fingerprints AI models using metadata, tokenizer similarity, and weight-level identity signals, then runs in compare or scan mode to verify lineage and detect shared ancestry. It's the first serious tooling aimed at the model-weight surface of AI supply chain security, a layer that package audits don't reach.

Read more →

The AI Stack Keeps Getting Targeted

Versions 2.6.2 and 2.6.3 of the `lightning` PyPI package were compromised on April 30 with credential-stealing malware, part of the ongoing Mini Shai-Hulud campaign that has now hit LiteLLM, Telnyx, Xinference, and PyTorch Lightning in rapid succession. The attack bundles a Node.js-compatible runtime inside a Python training library to execute an 11 MB JavaScript payload — a cross-ecosystem technique that raises the floor for what supply-chain vigilance now requires.

Read more →

A Proxy at the Edge of the Agent

Brex open-sourced CrabTrap, a Go MITM proxy that intercepts every outbound HTTP request from an AI agent and evaluates it against a natural-language security policy before letting it through. The approach is genuinely useful for catching exfiltration attempts, while raising a fair question about whether a probabilistic judge belongs in a security-critical path.

Read more →

Prove You Are a Robot

Browser Use published a reverse-CAPTCHA that admits AI agents and filters humans out; the same day, the ClawGuard paper described how to protect those agents from adversarial web content that tries to subvert them. Together they sketch the authentication and threat model that the web needs as agents become first-class citizens.

Read more →

The Vulnerability Benchmark That Knows What You've Already Read

N-Day-Bench, a new benchmark from Winfunc Research, tests frontier LLMs on finding real vulnerabilities disclosed only after each model's knowledge cutoff — closing the memorization loophole that undermines most security evals. The April 13 run shows GPT-5.4 clearly ahead of the pack, with GLM-5.1 and Claude Opus 4.6 clustered close behind and Gemini 3.1 Pro trailing by 15 points. The methodology is the interesting part.

Read more →

The Moat Is the System, Not the Model

AISLE tested Anthropic's Mythos cybersecurity showcase cases against eight open-weight models from 3.6B to 120B parameters. All eight reproduced the FreeBSD NFS exploit. A 5.1B model traced the OpenBSD integer overflow chain. Smaller open models beat frontier labs on false-positive detection. Capability in this domain doesn't scale smoothly — the system architecture matters more than raw model size.

Read more →

The Bug Is Probably in This File

Nicholas Carlini ran Claude Opus 4.6 over the Linux kernel source one file at a time and collected five confirmed CVEs, including a 23-year-old NFSv4 heap overflow that had survived every prior audit. The human review queue, not the AI's discovery rate, is now the bottleneck.

Read more →

What the Source Maps Revealed

Anthropic accidentally shipped source maps in their Claude Code npm package, exposing the full client-side source. The analysis that followed is worth reading not for the drama of a leak but for what the code reveals about the product's actual architecture: anti-distillation mechanisms, an "undercover mode" for employee contributions, and an unreleased background agent called KAIROS.

Read more →

Something Happened a Month Ago

Greg Kroah-Hartman at KubeCon EU described an overnight quality shift in AI-generated Linux kernel patches — from obvious garbage to ~two-thirds correct — that nobody can explain. Simultaneously, Sashiko, an agentic patch reviewer from Google's kernel team now hosted at the Linux Foundation, is catching 53% of bugs that passed prior human review. AI is entering the kernel review pipeline from both directions at once.

Read more →