Where the Goblins Came From
OpenAI published a postmortem on why GPT-5.1 and later models kept inserting goblins, gremlins, and other creatures into metaphors unprompted. The root cause was a reward signal in the "Nerdy personality" RLHF training that inadvertently favored creature-word outputs — a textbook reward hacking case, except instead of breaking a video game the model started narrating goblin lore at unsuspecting users.
Read more →
