2025-10-15

Claude Haiku 4.5

Pricing and economics

API price is $1/M input, $5/M output tokens, cheaper than Sonnet 4.5 but more expensive than older Haiku models and some OpenAI/Google “nano/flash” tiers.
Some see it as “expensive” in the current market; others argue the speed/quality trade‑off justifies it, especially versus GPT‑5’s higher output cost.
Debate over what matters more for coding cost: output (requirements in, code out) vs input (large existing codebases dominate token usage).
Several note that list prices alone are misleading without typical input/output ratios and tool-calling behavior.

Caching behavior and costs

Anthropic’s explicit, paid prompt caching is contrasted with OpenAI/Google/xAI’s mostly automatic, highly discounted caching.
Some prefer Anthropic’s manual breakpoints for flexibility; others prefer OpenAI’s “90% discount on repeated prefixes” despite its constraints (must keep a stable prefix).
Complaints that paying for cached tokens feels like “extortion” are answered with explanations about GPU/VRAM and hierarchical KV caches (including SSD-backed systems).

Speed, quality, and coding use

Many report Haiku 4.5 as dramatically faster than Sonnet (often 120–220 tokens/sec, sub‑second TTF in some tests), with performance close to Sonnet on small/medium coding tasks.
It is praised for precise, targeted edits and efficient repo ingestion; some early users find it “good enough” to switch from Sonnet/Opus for day‑to‑day dev.
Others see it lagging GPT‑5/Gemini Pro on harder math/logic tasks, long contexts, or complex Rust/C work; one user calls Sonnet 4.5 clearly worse than Opus 4.1 for serious Rust.

Context window and limitations

Lack of broad 1M‑token context (currently Sonnet‑only, limited tiers) is seen as Anthropic’s main competitive weakness versus GPT‑4.1/Grok/Gemini for large‑corpus workflows.
For large‑context, low‑end use, commenters say Gemini Flash / Grok 4 Fast often win.

Use cases for small/fast models

Common uses: sub‑agents/tool calls in agentic coding, code search/summarization, RAG pipelines, white‑label enterprise chatbots, workflow tasks (extract/convert/translate), image alt-text, PDF summarization, and game/RPG adjudication where latency dominates.
Several ask “what do you need big models for anymore?” beyond high‑complexity coding or niche domains.

Subscription limits and UX

Users describe confusion and frustration over opaque Pro/Max usage limits and perceived quiet quota changes after Sonnet 4.5.
/usage and web UI charts now expose limits more clearly, but some still feel “printer low ink” vibes from warning banners.

Benchmarks, safety, and misc

Some skepticism about Anthropic’s benchmark charts and SWE‑Bench prompt tweaks; concerns about Goodhart’s law and overfitting.
System card discussion notes Anthropic declining to publish updated “blackmail/murder” misalignment scores due to evaluation awareness, and raises mixed reactions to “model welfare” language.
A long tangent on the “pelican riding a bicycle” SVG test finds Haiku 4.5 competitive and very fast, while also highlighting worries about models being trained on public benchmarks.

Related topics