Claude Haiku 4.5
Pricing and economics
- API price is $1/M input, $5/M output tokens, cheaper than Sonnet 4.5 but more expensive than older Haiku models and some OpenAI/Google “nano/flash” tiers.
- Some see it as “expensive” in the current market; others argue the speed/quality trade‑off justifies it, especially versus GPT‑5’s higher output cost.
- Debate over what matters more for coding cost: output (requirements in, code out) vs input (large existing codebases dominate token usage).
- Several note that list prices alone are misleading without typical input/output ratios and tool-calling behavior.
Caching behavior and costs
- Anthropic’s explicit, paid prompt caching is contrasted with OpenAI/Google/xAI’s mostly automatic, highly discounted caching.
- Some prefer Anthropic’s manual breakpoints for flexibility; others prefer OpenAI’s “90% discount on repeated prefixes” despite its constraints (must keep a stable prefix).
- Complaints that paying for cached tokens feels like “extortion” are answered with explanations about GPU/VRAM and hierarchical KV caches (including SSD-backed systems).
Speed, quality, and coding use
- Many report Haiku 4.5 as dramatically faster than Sonnet (often 120–220 tokens/sec, sub‑second TTF in some tests), with performance close to Sonnet on small/medium coding tasks.
- It is praised for precise, targeted edits and efficient repo ingestion; some early users find it “good enough” to switch from Sonnet/Opus for day‑to‑day dev.
- Others see it lagging GPT‑5/Gemini Pro on harder math/logic tasks, long contexts, or complex Rust/C work; one user calls Sonnet 4.5 clearly worse than Opus 4.1 for serious Rust.
Context window and limitations
- Lack of broad 1M‑token context (currently Sonnet‑only, limited tiers) is seen as Anthropic’s main competitive weakness versus GPT‑4.1/Grok/Gemini for large‑corpus workflows.
- For large‑context, low‑end use, commenters say Gemini Flash / Grok 4 Fast often win.
Use cases for small/fast models
- Common uses: sub‑agents/tool calls in agentic coding, code search/summarization, RAG pipelines, white‑label enterprise chatbots, workflow tasks (extract/convert/translate), image alt-text, PDF summarization, and game/RPG adjudication where latency dominates.
- Several ask “what do you need big models for anymore?” beyond high‑complexity coding or niche domains.
Subscription limits and UX
- Users describe confusion and frustration over opaque Pro/Max usage limits and perceived quiet quota changes after Sonnet 4.5.
/usageand web UI charts now expose limits more clearly, but some still feel “printer low ink” vibes from warning banners.
Benchmarks, safety, and misc
- Some skepticism about Anthropic’s benchmark charts and SWE‑Bench prompt tweaks; concerns about Goodhart’s law and overfitting.
- System card discussion notes Anthropic declining to publish updated “blackmail/murder” misalignment scores due to evaluation awareness, and raises mixed reactions to “model welfare” language.
- A long tangent on the “pelican riding a bicycle” SVG test finds Haiku 4.5 competitive and very fast, while also highlighting worries about models being trained on public benchmarks.