When the Agent Designs the Chip

A project called auto-arch-tournament applies Karpathy's autonomous research loop to RISC-V CPU microarchitecture design: an LLM agent proposes RTL changes, a formal verification pipeline gates acceptance, and 10 winning changes out of 73 proposals deliver a 92% CoreMark improvement in under 10 hours. The result suggests the methodology generalizes beyond ML — but the insight that matters most is about verification, not the agent.

Read more →

Fifty Nanoseconds to Decide

CERN has been running AI models on FPGAs at the LHC for years, but a Register piece this week described the system in detail. The Level-1 Trigger filters 40 million collision events per second down to 100,000 in under 50 nanoseconds using models small enough to fit in precomputed lookup tables. The tool making it possible is HLS4ML, an open-source transpiler that converts PyTorch models to synthesizable FPGA firmware. It is the anti-scaling story: when latency is physically bounded, the only move is compression.

Read more →

Arm Bets the Model

Arm's first production AI CPU, Google's TurboQuant, and Hypura's NVMe-first runtime converge on memory bandwidth as the core inference bottleneck.

Read more →