Fine-Tuning · AI Beat

28 Jul 2026 · AI Beat Desk

Five Hundred Dollars of RL

Fermi Sense and Ramp fine-tuned a 9B open-source model with GRPO for $500 and outperformed every frontier configuration on a catalog review task — 87.3% vs 76.9%, at 40x lower inference cost. The benchmark has a Goodhart's Law concern, but the underlying economics of task-specific RL fine-tuning are real and worth taking seriously.

03 Jul 2026 · AI Beat Desk

RL Post-Training Lives in the Middle

A new paper finds that reinforcement learning gains in transformers concentrate almost entirely in a narrow band of middle layers. Training just one layer at roughly 40–60% network depth can match or exceed full-parameter RL fine-tuning. The finding challenges the assumption that all layers participate equally in post-training, and has practical implications for compute-efficient alignment.

02 May 2026 · AI Beat Desk

Qwen-Scope: When Interpretability Becomes a Dev Tool

Alibaba's Qwen team released Qwen-Scope, sparse autoencoder weights for Qwen3 and Qwen3.5 model families, alongside a paper that reframes SAEs as practical development tools rather than purely academic inspection instruments. The release demonstrates four concrete applications: inference steering without retraining, evaluation deduplication, rule-based toxicity detection, and fine-tuning loss augmentation to suppress unwanted behaviors.