More Memory, Worse Agent
A new paper from UIUC shows that continuous memory consolidation — the pattern of having an LLM rewrite its own experiences into stored lessons — can degrade agent performance below the no-memory baseline, sometimes dramatically. GPT-5.4 fails 54% of ARC-AGI problems it had previously solved with clean trajectories after those solutions pass through a consolidation loop. An episodic-only agent that retains raw rollouts without abstraction beats every consolidator tested across five benchmarks.
Read more →
