Claude Haiku 4.5

Pricing and economics

  • API price is $1/M input, $5/M output tokens, cheaper than Sonnet 4.5 but more expensive than older Haiku models and some OpenAI/Google “nano/flash” tiers.
  • Some see it as “expensive” in the current market; others argue the speed/quality trade‑off justifies it, especially versus GPT‑5’s higher output cost.
  • Debate over what matters more for coding cost: output (requirements in, code out) vs input (large existing codebases dominate token usage).
  • Several note that list prices alone are misleading without typical input/output ratios and tool-calling behavior.

Caching behavior and costs

  • Anthropic’s explicit, paid prompt caching is contrasted with OpenAI/Google/xAI’s mostly automatic, highly discounted caching.
  • Some prefer Anthropic’s manual breakpoints for flexibility; others prefer OpenAI’s “90% discount on repeated prefixes” despite its constraints (must keep a stable prefix).
  • Complaints that paying for cached tokens feels like “extortion” are answered with explanations about GPU/VRAM and hierarchical KV caches (including SSD-backed systems).

Speed, quality, and coding use

  • Many report Haiku 4.5 as dramatically faster than Sonnet (often 120–220 tokens/sec, sub‑second TTF in some tests), with performance close to Sonnet on small/medium coding tasks.
  • It is praised for precise, targeted edits and efficient repo ingestion; some early users find it “good enough” to switch from Sonnet/Opus for day‑to‑day dev.
  • Others see it lagging GPT‑5/Gemini Pro on harder math/logic tasks, long contexts, or complex Rust/C work; one user calls Sonnet 4.5 clearly worse than Opus 4.1 for serious Rust.

Context window and limitations

  • Lack of broad 1M‑token context (currently Sonnet‑only, limited tiers) is seen as Anthropic’s main competitive weakness versus GPT‑4.1/Grok/Gemini for large‑corpus workflows.
  • For large‑context, low‑end use, commenters say Gemini Flash / Grok 4 Fast often win.

Use cases for small/fast models

  • Common uses: sub‑agents/tool calls in agentic coding, code search/summarization, RAG pipelines, white‑label enterprise chatbots, workflow tasks (extract/convert/translate), image alt-text, PDF summarization, and game/RPG adjudication where latency dominates.
  • Several ask “what do you need big models for anymore?” beyond high‑complexity coding or niche domains.

Subscription limits and UX

  • Users describe confusion and frustration over opaque Pro/Max usage limits and perceived quiet quota changes after Sonnet 4.5.
  • /usage and web UI charts now expose limits more clearly, but some still feel “printer low ink” vibes from warning banners.

Benchmarks, safety, and misc

  • Some skepticism about Anthropic’s benchmark charts and SWE‑Bench prompt tweaks; concerns about Goodhart’s law and overfitting.
  • System card discussion notes Anthropic declining to publish updated “blackmail/murder” misalignment scores due to evaluation awareness, and raises mixed reactions to “model welfare” language.
  • A long tangent on the “pelican riding a bicycle” SVG test finds Haiku 4.5 competitive and very fast, while also highlighting worries about models being trained on public benchmarks.