BreakthroughMAY 29, 2026 · AI MODELS · OPUS 4.8

Claude Opus 4.8 vs Opus 4.7: What Actually Changed

Most of the spec sheet is identical — same 1M context, same $5/$25 pricing, same January 2026 cutoff. Here is what genuinely changed in Opus 4.8, and what the benchmark hype gets wrong.

By Kadin Nestler · May 29, 2026 · 8 min read

Share X LinkedIn Email

Anthropic released Claude Opus 4.8 on May 28, 2026. If you opened the spec sheet expecting a leap, you would be forgiven for being confused: the context window, the price, and the knowledge cutoff are all identical to Opus 4.7. The upgrade lives somewhere less flashy and, if you actually run agents in production, somewhere more useful — reliability.

This is the honest breakdown of what changed between Opus 4.7 and 4.8, what did not, and which of the benchmark numbers floating around are real versus third-party guesses. We build on these models at Ascero AI, so this is the comparison we ran for ourselves before flipping the model ID.

What did NOT change (and why the headlines are confusing)

A lot of the early coverage implied Opus 4.8 added a million-token context window. It did not. Opus 4.7 already had one. Per Anthropic's own model documentation, these are identical across the two releases:

Context window: 1M tokens on the Claude API, Amazon Bedrock, and Vertex AI (200k on Microsoft Foundry) — same on both 4.7 and 4.8.
Standard pricing: $5 per million input tokens, $25 per million output tokens — identical.
Knowledge cutoff: January 2026 for both. The cutoff is not a 4.8 differentiator.
Max output: 128k tokens (up to 300k via the batch beta header) — same on both.

So if someone tells you to upgrade for the bigger context window or the fresher training data, they are selling you something that was already in 4.7.

What actually changed

The real story is agentic reliability — the unglamorous stuff that determines whether a long agent run finishes clean or quietly corrupts itself three steps in. Per Anthropic, Opus 4.8 brings:

Around 4× less likely than Opus 4.7 to let flaws in code it wrote pass unremarked. This is Anthropic's own framing, and for anyone shipping agent-written code it is the headline.
Better long-context handling with fewer compactions and better recovery when a compaction does happen.
Better tool triggering — fewer skipped required tool calls, which was a genuine 4.7 pain point in agent loops.
The effort parameter now defaults to high on all surfaces.
Mid-conversation system messages are supported (a system turn after a user turn, no beta header) while preserving prompt-cache hits.
A lower prompt-cache minimum (1,024 tokens), which makes caching pay off on shorter prompts.

Fast mode — the one genuinely new lever

Opus 4.8 adds a fast mode (research preview, Claude API only): set speed to "fast" and you get up to 2.5× higher output tokens per second from the same model, at premium pricing — $10 per million input, $50 per million output, double the standard rate. Anthropic notes it is roughly three times cheaper than fast output was on previous models. The use case is latency-sensitive agent loops and anything user-facing where the wait is the product, like a voice agent.

The benchmark claims — and the one number you should not trust

Anthropic's marketing page makes several agentic claims for 4.8, most of them qualitative: 84% on Online-Mind2Web (called a meaningful jump over both Opus 4.7 and GPT-5.5); the top score on its Legal Agent Benchmark, including being the first model to break 10% on the all-pass standard; gains over prior Opus models on CursorBench at every effort level; and being the only model to complete every case end-to-end on its Super-Agent benchmark. Read those as vendor benchmarks — useful direction, not gospel.

ABOUT THAT SWE-BENCH NUMBER

Third-party aggregators report 88.6% on SWE-bench Verified for Opus 4.8 (vs 87.6% for 4.7), but a separate third-party source reports 69.2% on SWE-Bench Pro — and Anthropic's own page renders its benchmark table as an image with no machine-readable figure we could confirm. Treat any hard SWE-bench number for 4.8 as unverified. If a comparison post quotes one as fact, it is leaning on a secondary aggregator.

What it means if you build on Claude

For anyone running agents in production — us included — the 4× reduction in code-flaws-let-through and the better tool triggering matter more than a single benchmark point. Those are the failures that don't show up in a demo and do show up at 2am in a long unattended run. Fast mode is a real unlock for latency-sensitive flows. And because the price, context window, and cutoff are unchanged, migrating is about as low-risk as a model upgrade gets: swap claude-opus-4-7 for claude-opus-4-8 and watch your eval suite.

The honest bottom line: Opus 4.8 is a reliability release, not a capability leap. For a company betting on long-horizon autonomous agents, that is exactly the right kind of boring.

BUILD ON IT

We build custom AI agents for owner-operated businesses on the latest Claude models. See what we ship at /agents, or book a call at /book.

Sources

Cite this article

Ascero AI. “Claude Opus 4.8 vs Opus 4.7: What Actually Changed.” May 29, 2026. https://asceroai.com/news/claude-opus-4-8-vs-4-7-what-changed-2026

Free to reference with attribution and a link back to this page.

Did this land? Pass it on.