Glossary · Anthropic

Claude Haiku

Claude Haiku is Anthropic's fastest, cheapest model — built for high-volume, low-latency workloads. Definition, pricing, and when to use it.

By Kadin Nestler · May 28, 2026 · Updated May 28, 2026

What Haiku is built for

Haiku is the right tool for workloads where you need an LLM's capabilities but most queries are simple. Customer email classification. Lead qualification. Real-time voice agent turns where every 100ms of latency matters. Document tagging at scale. The model is small enough to serve millions of requests per day without breaking the budget and fast enough that users do not feel they are waiting for an answer.

When Haiku is the wrong choice

  • Multi-step reasoning where the model needs to plan and revise — escalate to Sonnet.
  • Long-context analysis over 50K+ tokens — Sonnet handles this better.
  • Anywhere correctness matters more than throughput — quality gap shows on hard prompts.
  • Software engineering — Haiku can patch small bugs but is not a primary coding model.

Pricing and access

Approximately $0.80 per million input tokens and $4 per million output tokens for the latest Haiku tier (May 2026). With prompt caching, input cost drops further. Available on Anthropic API, Bedrock, and Vertex AI. The cost-per-task math favors Haiku when individual queries are short (under 1K input tokens) and you serve tens of thousands or more per day.

The model routing pattern

Production teams in 2026 run a two-tier model router: Haiku triages every incoming request and classifies it as simple or complex. Simple requests get answered by Haiku directly. Complex requests get escalated to Sonnet or Opus. This pattern can cut total LLM spend by 60-80% on real workloads with minimal quality loss, because most queries genuinely are simple.

What it means for your business

For SMB workloads with high call or message volume — receptionists, intake bots, classification — Haiku is usually the workhorse and Sonnet is the escalation tier. The cost difference compounds fast at scale.

  • Claude Opus — Claude Opus is Anthropic's most capable model, tuned for deep reasoning, long context, and agentic coding. Definition, pricing, and when to use it.
  • Claude Sonnet — Claude Sonnet is Anthropic's balanced model — strong reasoning, lower cost than Opus, faster latency. Definition, pricing, and use cases.
  • Large Language Model (LLM) — A Large Language Model is a transformer-based neural network trained on trillions of tokens to predict the next token. Definition, key models, and business use.
  • Voice AI — Voice AI is the stack that lets computers understand and speak natural conversation. Definition, components, top platforms, and SMB use cases.
  • AI Orchestration — AI orchestration is the layer that coordinates LLM calls, tools, and data into a working application. Definition, top frameworks, and how to choose.