Claude 3.5 Sonnet
Overall reception and model comparisons
- Many subscribers to both OpenAI and Anthropic say they now use Claude (3 / 3.5 Sonnet) for most work, especially coding and writing.
- Several feel GPT‑4o is a downgrade from GPT‑4, which pushed them toward Claude. Others report GPT‑4(o) is still usually better for them.
- Claude is often preferred for “personality,” longer-context coherence, and more human-like writing; GPT‑4(o) is praised more for advanced math in some reports.
- Gemini 1.5 Pro gets mixed reviews: good for very long documents and Google integration, but more mistakes and easy jailbreaks via API.
Coding, tools, and artifacts
- 3.5 Sonnet is widely praised for code generation, debugging, and large-context work (Python, PySpark, Qt/QML, Rust bindings, PlantUML, ArgDown, etc.).
- New “artifacts” (inline HTML/JS, React components, small apps) impress many; they like fast iteration and separate code pane with history.
- Users integrate Claude 3.5 into VSCode, Neovim, Slack, and third‑party UIs; several say Anthropic’s prompt generator significantly improves results.
- Counterexamples: some see invented packages, incomplete refactors, broken code, and “lazy” stubs, especially on multi-file edits.
Pricing, access, and rate limits
- API pricing ($3/M input, $15/M output tokens) is considered very strong for the reported quality; some question why anyone would still use Opus unless for niche cases or migration lag.
- Free web usage currently hits message limits quickly; Pro is said to be “at least 5x” but also varies with demand. Rate limiting and automatic fallback to smaller models annoy some.
- Availability is expanding (e.g., Canada, Sweden, Switzerland, Belgium). Signup requires email and often phone; auto‑bans and appeal flow frustrate some.
Benchmarks, reasoning, and math
- Thread cites strong results on internal “agentic coding” tasks and improvements in long‑context retrieval; external coding benchmarks place it near the top but not always #1.
- Users report good performance on many math/probability problems (e.g., certain Gaussian expectations, calorie estimates) and university‑level analysis/algebra, though failures remain.
- Reasoning issues persist: flawed number‑theory proof, confusion over physics examples, inconsistent commonsense/world‑model answers, and classic weaknesses like counting letters.
Safety, guardrails, and UX
- Claude is seen as more conservative and “very, very ethical,” sometimes refusing content merely due to profanity or in fantasy/DnD contexts; some value this, others find it obstructive.
- UX gaps vs ChatGPT include weaker conversation sharing, no Android app yet, limited math rendering, no code execution, and less polished branching/editing in the default UI.
- Many users work around this with third‑party or self‑hosted frontends that add features like custom system prompts, message editing, and multi‑model access.