2024-06-20

Claude 3.5 Sonnet

Overall reception and model comparisons

Many subscribers to both OpenAI and Anthropic say they now use Claude (3 / 3.5 Sonnet) for most work, especially coding and writing.
Several feel GPT‑4o is a downgrade from GPT‑4, which pushed them toward Claude. Others report GPT‑4(o) is still usually better for them.
Claude is often preferred for “personality,” longer-context coherence, and more human-like writing; GPT‑4(o) is praised more for advanced math in some reports.
Gemini 1.5 Pro gets mixed reviews: good for very long documents and Google integration, but more mistakes and easy jailbreaks via API.

Coding, tools, and artifacts

3.5 Sonnet is widely praised for code generation, debugging, and large-context work (Python, PySpark, Qt/QML, Rust bindings, PlantUML, ArgDown, etc.).
New “artifacts” (inline HTML/JS, React components, small apps) impress many; they like fast iteration and separate code pane with history.
Users integrate Claude 3.5 into VSCode, Neovim, Slack, and third‑party UIs; several say Anthropic’s prompt generator significantly improves results.
Counterexamples: some see invented packages, incomplete refactors, broken code, and “lazy” stubs, especially on multi-file edits.

Pricing, access, and rate limits

API pricing ($3/M input, $15/M output tokens) is considered very strong for the reported quality; some question why anyone would still use Opus unless for niche cases or migration lag.
Free web usage currently hits message limits quickly; Pro is said to be “at least 5x” but also varies with demand. Rate limiting and automatic fallback to smaller models annoy some.
Availability is expanding (e.g., Canada, Sweden, Switzerland, Belgium). Signup requires email and often phone; auto‑bans and appeal flow frustrate some.

Benchmarks, reasoning, and math

Thread cites strong results on internal “agentic coding” tasks and improvements in long‑context retrieval; external coding benchmarks place it near the top but not always #1.
Users report good performance on many math/probability problems (e.g., certain Gaussian expectations, calorie estimates) and university‑level analysis/algebra, though failures remain.
Reasoning issues persist: flawed number‑theory proof, confusion over physics examples, inconsistent commonsense/world‑model answers, and classic weaknesses like counting letters.

Safety, guardrails, and UX

Claude is seen as more conservative and “very, very ethical,” sometimes refusing content merely due to profanity or in fantasy/DnD contexts; some value this, others find it obstructive.
UX gaps vs ChatGPT include weaker conversation sharing, no Android app yet, limited math rendering, no code execution, and less polished branching/editing in the default UI.
Many users work around this with third‑party or self‑hosted frontends that add features like custom system prompts, message editing, and multi‑model access.

Related topics