The Post-Training Agent

Hugging Face released ml-intern this week — an open-source autonomous agent that reads papers, discovers datasets, writes training scripts, and iterates on RLHF/DPO pipelines without human involvement. A demo run pushed Qwen3-1.7B from roughly 10% to 32% on GPQA in under ten hours. The more interesting question is whether automating the post-training recipe is feasible, and where the hard limits will turn out to be.

Read more →