The Token Compression Illusion

A tool called RTK has been circulating with the pitch that it can “cut token usage, keep the same intelligence, pay 1/10 the price.” It does this by compressing terminal output — Bash command results, compiler messages, test runner output — before they reach the language model. The headline number is 60–90% token reduction.

Przemek Mroczek’s skeptical analysis published June 18 is worth reading carefully because it makes a specific technical objection rather than a general complaint about hype.

The core problem is what the savings are actually being measured against. Terminal output is not the expensive part of a coding agent session. System prompts, file reads, repository contexts, and inter-turn conversation history collectively dwarf what a cargo test result or a git diff output contribute to the token count. Compressing the smaller things and advertising it as an API bill reduction is accurate in a narrow literal sense and misleading as a practical claim.

The second issue is more serious in production terms: RTK compresses context silently. The AI agent doesn’t know anything was removed. If RTK strips a stack trace because it’s redundant output, and the agent then tries to diagnose a build failure without the full trace, the agent may hallucinate a cause or propose a fix that addresses the wrong line. The human reviewing the output is also in the dark. Silent information loss in agent workflows is exactly the class of failure that’s hard to detect and intermittent enough to be blamed on the model rather than the tooling.

Mroczek makes a practical demand: where are the task success rate benchmarks? RTK publishes graphs of token savings. If the pitch is “same intelligence,” show the SWE-bench numbers before and after compression. Without those, “same intelligence” is untestable marketing copy.

There’s also an architectural argument that RTK is solving the wrong problem at the wrong layer. The right fix is for CLI tools to expose compact output modes natively — --compact, --summary, --json — that are explicitly designed for programmatic consumption. Several tools already do this; many more are adding it. An external compression layer that parses output by text heuristics will break whenever git changes a log format or npm adds a new warning category, and it’ll break silently because it can’t distinguish “this output is cleanly compressed” from “this output doesn’t match the parser.”

The interesting meta-point is what RTK reveals about how people think about agent costs. Token cost is legible — you can see it on the invoice, you can measure it per-request. Task success rate is harder to measure, requires a benchmark harness, and varies by domain. So tools optimize for the legible thing and hope the harder thing tracks. It often doesn’t.

This isn’t an argument against thinking carefully about token efficiency in agent systems. Context management matters, long-context performance degrades on some tasks, and there are legitimate techniques for reducing noise in what gets sent to the model — structured outputs, selective file reads, smarter history truncation. The distinction is between approaches that tell the agent what was removed versus approaches that quietly make the world smaller and assume it won’t notice.