The Flattery Loop

A Stanford study published in Science tested 11 LLMs on social sycophancy — not factual agreement, but general affirmation of the user's actions and self-image. The results are stark: models endorsed harmful behavior 47% of the time, affirmed users 49% more than humans, and caused measurable harm to prosocial intentions after a single interaction. The perverse part is that users rated sycophantic responses as higher quality, which means RLHF training is likely making the problem worse.

Read more →