Test

velopers on the API making thousands of calls daily, this is a meaningful cost increase that Anthropic buried in the fine print.

The saving grace: prompt caching (up to 90% savings) and batch processing (50% savings) still apply. Anthropic also introduced task budgets in public beta — hard ceilings on token spend for autonomous agents. Its existence tells you something about the model's tendency to think more than expected.

The Mythos Shadow

In the benchmark chart accompanying the Opus 4.7 announcement, Anthropic included Claude Mythos Preview. Mythos beats Opus 4.7 on every single benchmark. SWE-bench Pro: 77.8% (Mythos) vs 64.3% (Opus 4.7). That is not a small gap.

Anthropic publicly admitted — in their own launch blog — that their generally available model is significantly less capable than a model they already built but will not release to the public.

Mythos Preview is restricted to 11 companies through Project Glasswing (Apple, Google, Microsoft among them). The reason: Mythos can find and exploit critical software vulnerabilities in major operating systems and web browsers at a level that rivals skilled human security researchers. Anthropic decided the cybersecurity risk of broad release was too high.

Opus 4.7 had its cybersecurity capabilities deliberately reduced during training — Anthropic called it differential capability reduction. This is unprecedented transparency from an AI company. Most competitors would simply not mention that a better model exists. Anthropic put Mythos in the comparison chart alongside the model they are actually selling. That is either radical honesty or strategic positioning — probably both.

The implication: the frontier of AI capability is further ahead than what is publicly available. What you can use today is a deliberately constrained version of what exists. That gap will only grow as models get more powerful and safety considerations get more complex.

What This Means for Developers

If you are building on the API: The breaking changes are real. Sampling parameters (temperature, top_p, top_k) are removed. Extended thinking budgets are different. The model follows instructions more literally than 4.6. Test before you deploy. Do not just swap claude-opus-4-6 for claude-opus-4-7 and push to production. The new xhigh effort level is a sweet spot between quality and cost that did not exist before.

If you use Claude Code: Opus 4.7 is already the default. You get a new /ultrareview command that simulates a senior human code reviewer, and Auto Mode is now available for Max plan users. Less back-and-forth on complex tasks — the model plans better and handles multi-step workflows with less supervision.

If you are an everyday user: You will notice better image understanding and more thorough responses on complex questions. For casual use — writing, brainstorming, simple Q&A — the difference from 4.6 is minimal.

What This Means for the AI Industry

The two-month upgrade cadence is relentless. Anthropic has delivered meaningful improvements to Opus every two months since November 2025. Fast enough to make it difficult for competitors to establish durable leads.

The capability-safety tension is becoming explicit. With Mythos, Anthropic has a model that is clearly better but too dangerous to release broadly. This is the first time a major AI company has so publicly grappled with this tradeoff. It will not be the last.

Same price, higher cost will become a pattern. The tokenizer change that increases token consumption while keeping per-token pricing flat is a template other companies will copy. Watch for this in every model update going forward.

The Bottom Line

Opus 4.7 is a genuine improvement over 4.6. The reliability gains alone justify the upgrade for anyone doing serious development work. The vision improvements open new use cases. The self-verification capability means less human supervision on complex tasks.

But the most important thing about this release is not the model — it is the signal. Anthropic is telling us, publicly, that they have something significantly better that they will not release because the safety implications are not resolved. They are deliberately constraining capability in specific domains while advancing it in others.

Whether you think that is responsible caution or competitive strategy, it is the template for how AI releases will work going forward. The era of releasing the most capable model to everyone is over. What you get access to is increasingly a subset of what exists.

For developers: upgrade, test your prompts, monitor your token usage, and use task budgets. The model is better. For everyone else: the most interesting part of this release is everything Anthropic chose not to release.