The Navigator Problem in Research Agents

Argus (arXiv 2605.16217, May 15) splits research agents into a Searcher that gathers evidence ReAct-style and an RL-trained Navigator that maintains an evidence graph, identifies missing pieces, and dispatches parallel Searchers purposefully. With 64 parallel Searchers and a 35B-A3B MoE backbone, Argus reaches 86.2 on BrowseComp — highest reported for any agent system — while keeping Navigator context under 21.5K tokens. The separation of search from orchestration turns out to matter more than raw parallelism.

Read more →

Agents Need Systems Thinking, Not Just Aligned Models

Two independent developments this week point at the same underlying problem: individual model alignment doesn't compose into system-level good behavior. Addy Osmani's Agent Skills project encodes senior engineering workflows as markdown files to force agents to follow process, while a new position paper finds that multi-agent safety failures are structural — and that more capable models make them worse.

Read more →