Test

Anthropic released Claude Opus 4.7 today, April 16, 2026. The fourth Opus upgrade in six months. The benchmarks are impressive. The improvements are real. But the most interesting story is not the model itself — it is what it reveals about where Anthropic is headed, what they are holding back, and what this means for everyone building on AI right now.

I use Claude every day across multiple projects — from software development and data analysis to content creation and business automation. I have spent the last two days deep in Anthropic's ecosystem. Here is my honest take on Opus 4.7, with no hype and no disclaimers.

What Opus 4.7 Actually Is

Opus 4.7 is a direct upgrade to Opus 4.6, which launched in February 2026. It is not a new product tier — it replaces 4.6 as the default Opus model. Same pricing: $5 per million input tokens, $25 per million output tokens. Same 1 million token context window. Same availability across AWS Bedrock, Google Vertex AI, and Microsoft Foundry.

Anthropic's release cadence is now roughly every two months: Opus 4.1, 4.5 (November 2025), 4.6 (February 2026), and now 4.7 (April 2026). That is fast by any standard.

The Numbers That Matter

Coding — The Headline Improvement

SWE-bench Pro (resolving real GitHub issues): 64.3%, up from 53.4% on Opus 4.6. That is an 11-point jump. For context, GPT-5.4 scores 57.7% and Gemini 3.1 Pro hits 54.2%. Opus 4.7 leads the field on the benchmark that matters most to developers.

CursorBench: 70%, up from 58% on Opus 4.6. A 12-point jump on the benchmark that measures how the model performs in the tool developers actually use daily.

Rakuten-SWE-Bench: 3x more tasks resolved than Opus 4.6, with double-digit gains in code quality and test quality. Hex's 93-task coding benchmark: 13% improvement, including four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve at all.

Vision — The Surprise Upgrade

Maximum image resolution jumped from 1.15 megapixels to 3.75 megapixels — a 3x increase. Visual acuity went from 54.5% to 98.5% on internal benchmarks. The model can now read small text in screenshots, precisely extract table data from scanned documents, and map coordinates 1:1 with actual pixels for computer use. If you are building screen automation or document processing, this changes what is possible.

Reasoning — Quietly Strong

GPQA Diamond (graduate-level reasoning): 94.2%. arXiv Reasoning with tools: 91.0%, up from 84.7% on Opus 4.6. GDPVal-AA (finance and legal knowledge work): Elo score of 1753, beating GPT-5.4 (1674) and Gemini 3.1 Pro (1314) by a wide margin. BigLaw Bench accuracy: 90.9%.

Where It Does Not Lead

Opus 4.7 is not the best at everything. GPT-5.4 still leads in agentic search (89.3% vs 79.3%) and some multilingual benchmarks. Gemini 3.1 Pro has a 2 million token context window — double Opus 4.7's million. And Anthropic's own Mythos Preview beats Opus 4.7 across the board.

The Real Story: Reliability Over Raw Intelligence

Here is what the benchmarks do not capture and what early testers consistently report: the biggest improvement is not in what Opus 4.7 can do — it is in how reliably it does it.

Opus 4.6 was smart but unreliable on complex tasks. Users on GitHub were vocal — an AMD senior director wrote in a widely-shared post that Claude had regressed to the point it could not be trusted to perform complex engineering. Whether that was an actual regression or unmet expectations, the perception was real.

Opus 4.7 addresses this head-on. The model catches its own logical faults during planning. It self-verifies outputs before reporting back. It continues executing through tool failures that would have stopped Opus 4.6 dead. Multiple early testers describe it as the first version you can hand off hard work to without babysitting.

Hex quantified it: low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6. Warp reported it passed tasks that prior Claude models had failed and worked through a concurrency bug that Opus 4.6 could not crack.

In any production workflow that depends on AI, reliability is worth more than marginal intelligence gains. A model that is 5% smarter but fails unpredictably costs more than a model that is consistently good.

The Pricing Trap Nobody Is Talking About

Anthropic announced that pricing remains the same as Opus 4.6. That is technically true and practically misleading.

Opus 4.7 uses a new tokenizer. The same input text can produce 1.0 to 1.35x more tokens depending on content type. That means your identical prompts could cost up to 35% more in token consumption — even though the price per token has not changed. On top of that, Opus 4.7 thinks more at higher effort levels, generating more output tokens on complex tasks.

For casual users on Claude Pro ($20/month), this does not matter — you have a usage allocation, not per-token billing. For de

What Opus 4.7 Actually Is

The Numbers That Matter

Coding — The Headline Improvement

Vision — The Surprise Upgrade

Reasoning — Quietly Strong

Where It Does Not Lead

The Real Story: Reliability Over Raw Intelligence

The Pricing Trap Nobody Is Talking About

Stay informed

Claude AI Performance Problems Spark Developer Exodus as Anthropic Faces Trust Crisis

More in AI News

Claude Opus 4.7: An Honest Review — What It Is, What It Means, and What Anthropic Isn't Telling You

Test

How to Use Perplexity AI to Research and Compare AI Tool Features Before Purchase