Claude 3.5 Sonnet

Overall reception and model comparisons

  • Many subscribers to both OpenAI and Anthropic say they now use Claude (3 / 3.5 Sonnet) for most work, especially coding and writing.
  • Several feel GPT‑4o is a downgrade from GPT‑4, which pushed them toward Claude. Others report GPT‑4(o) is still usually better for them.
  • Claude is often preferred for “personality,” longer-context coherence, and more human-like writing; GPT‑4(o) is praised more for advanced math in some reports.
  • Gemini 1.5 Pro gets mixed reviews: good for very long documents and Google integration, but more mistakes and easy jailbreaks via API.

Coding, tools, and artifacts

  • 3.5 Sonnet is widely praised for code generation, debugging, and large-context work (Python, PySpark, Qt/QML, Rust bindings, PlantUML, ArgDown, etc.).
  • New “artifacts” (inline HTML/JS, React components, small apps) impress many; they like fast iteration and separate code pane with history.
  • Users integrate Claude 3.5 into VSCode, Neovim, Slack, and third‑party UIs; several say Anthropic’s prompt generator significantly improves results.
  • Counterexamples: some see invented packages, incomplete refactors, broken code, and “lazy” stubs, especially on multi-file edits.

Pricing, access, and rate limits

  • API pricing ($3/M input, $15/M output tokens) is considered very strong for the reported quality; some question why anyone would still use Opus unless for niche cases or migration lag.
  • Free web usage currently hits message limits quickly; Pro is said to be “at least 5x” but also varies with demand. Rate limiting and automatic fallback to smaller models annoy some.
  • Availability is expanding (e.g., Canada, Sweden, Switzerland, Belgium). Signup requires email and often phone; auto‑bans and appeal flow frustrate some.

Benchmarks, reasoning, and math

  • Thread cites strong results on internal “agentic coding” tasks and improvements in long‑context retrieval; external coding benchmarks place it near the top but not always #1.
  • Users report good performance on many math/probability problems (e.g., certain Gaussian expectations, calorie estimates) and university‑level analysis/algebra, though failures remain.
  • Reasoning issues persist: flawed number‑theory proof, confusion over physics examples, inconsistent commonsense/world‑model answers, and classic weaknesses like counting letters.

Safety, guardrails, and UX

  • Claude is seen as more conservative and “very, very ethical,” sometimes refusing content merely due to profanity or in fantasy/DnD contexts; some value this, others find it obstructive.
  • UX gaps vs ChatGPT include weaker conversation sharing, no Android app yet, limited math rendering, no code execution, and less polished branching/editing in the default UI.
  • Many users work around this with third‑party or self‑hosted frontends that add features like custom system prompts, message editing, and multi‑model access.