Claude 4.7's Quiet Migration Tax

Claude Opus 4.7 landed on April 16 with benchmark numbers that justify the upgrade: SWE-bench Verified up from 53.4% to 64.3%, triple the image resolution, and a new xhigh effort level for agentic work. The per-token price stayed flat—$5 input, $25 output per million tokens. None of that is surprising. What’s in the fine print is.

The Tokenizer

Anthropic’s release notes acknowledge it plainly: “This new tokenizer may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content).” Independent measurements are higher in practice—claudecodecamp.com measured 1.47x on real technical documentation and 1.29–1.45x across code samples, while English prose sat closer to 1.20x and CJK barely moved. The mechanism is shorter sub-word merges: the tokenizer carves text into smaller pieces, which the model apparently needs for more literal instruction following (+5pp on IFEval strict) but which costs you more tokens for the same input.

For a context-heavy coding session, this adds up. The same CLAUDE.md file that cost N tokens in 4.6 costs 1.45N in 4.7. Prompt caches rebound as well: if you’ve built cache hit rates around existing token counts, you need to re-measure. The official advice is to give your max_tokens and compaction triggers additional headroom before migrating—good advice that implies the bump is real enough to cause silent failures if you don’t.

Temperature Is Gone

The more structural change is sampling parameter removal. Setting temperature, top_p, or top_k to any non-default value now returns a 400 error. Not a warning, not a silent clip—a hard failure.

Anthropic’s rationale is that these parameters were a poor abstraction for controlling output behavior, and that prompting is a more reliable lever. That may be true, but a lot of production code—internal tools, evals, creative generation pipelines—hardcodes temperature=0 for determinism, or uses temperature=1.0 for varied sampling. Every one of those callsites is now broken. The migration guide says: “omit these parameters entirely from requests, and use prompting to guide the model’s behavior.” That’s a real ask, not a search-and-replace.

The extended thinking budget API is also gone. thinking: {"type": "enabled", "budget_tokens": N} returns a 400. The replacement is adaptive thinking combined with the effort parameter: thinking: {"type": "adaptive"} plus an output config with your effort level. Adaptive thinking is off by default, so existing reasoning pipelines that relied on the old syntax need explicit opt-in to avoid regressing.

What This Adds Up To

In exchange for removing these controls, Anthropic gives you two new cost management tools. Task budgets let you give an agent an advisory token cap across its full loop—tool calls, reasoning, outputs—and the model uses it to pace and prioritize. The xhigh effort level sits above high and is specifically tuned for coding and long-horizon agentic work.

The pattern here is deliberate: Anthropic is replacing low-level knobs (temperature, explicit thinking budgets) with higher-level abstractions (effort, task budgets, adaptive thinking). The API becomes less configurable and more opinionated. If you were using the raw dials, you now use Anthropic’s vocabulary for the same goals. Whether that’s a good trade depends on how much your workflow relies on precise sampling control versus just wanting good outputs.

The capability gains are real. The migration is less trivial than the unchanged price tag implies. Measure your actual token counts on representative traffic before flipping the switch on anything that runs at scale.