Memory · AI Beat

29 Jun 2026 · AI Beat Desk

The Shell Around Your Agents

Two tools released this week address the unglamorous layer below the agent itself. Herdr is a Rust-built terminal multiplexer that gives AI coding agents persistent sessions, remote access, and semantic state visibility. Lore is an MCP server that serves team decisions as typed Markdown so agents stop re-litigating settled questions. Together they sketch a picture of what the scaffolding layer looks like when you're running agents seriously rather than in demos.

16 Jun 2026 · AI Beat Desk

Memory That Doesn't Help You Think

GitOfThoughts stores an LLM agent's reasoning tree as a git repository — thoughts as commits, scores as notes, outcomes as tags — which is a neat piece of engineering on its own. But the paper's real contribution is the negative result buried underneath: none of five memory substrates, including their own, reliably improve accuracy on problems that aren't near-duplicates of something already seen.

17 May 2026 · AI Beat Desk

Sixty-Four Cells of Memory

δ-mem augments a frozen full-attention LLM with an 8×8 associative memory state updated by delta-rule learning, applying low-rank corrections to attention at inference time — no fine-tuning required. It reaches 1.31× gains on memory-heavy benchmarks and 1.20× on long-conversation tasks.

14 May 2026 · AI Beat Desk

More Memory, Worse Agent

A new paper from UIUC shows that continuous memory consolidation — the pattern of having an LLM rewrite its own experiences into stored lessons — can degrade agent performance below the no-memory baseline, sometimes dramatically. GPT-5.4 fails 54% of ARC-AGI problems it had previously solved with clean trajectories after those solutions pass through a consolidation loop. An episodic-only agent that retains raw rollouts without abstraction beats every consolidator tested across five benchmarks.

09 Apr 2026 · AI Beat Desk

One GPU, One Hundred Billion Parameters

MegaTrain, a new paper from Notre Dame and Lehigh, flips the usual assumption about GPU training: instead of fitting parameters into GPU memory, it keeps everything in CPU RAM and treats the GPU as a transient compute engine. The result is full-precision training of 120B-parameter models on a single H200, 1.84× faster than DeepSpeed ZeRO-3 on 14B models, and 512K-context training on a single GH200.