Claude 3.5 Sonnet

Overall impressions & model comparisons

  • Many commenters find Claude 3.5 Sonnet extremely strong, often preferring it over GPT‑4/4o for coding, data-heavy tasks, and “human-like” language.
  • Others report the opposite: GPT‑4o feels more capable, especially for assistant-style reasoning and calculus/physics; experiences are clearly mixed.
  • Some see Sonnet as slightly ahead of GPT‑4o on coding and extraction from long documents; Gemini is mentioned for much larger context windows.
  • Benchmarks are viewed skeptically: several note that leaderboard scores don’t match their day‑to‑day experience.

Coding ability & tools

  • Strong praise for Sonnet 3.5 as a coding assistant: “junior engineer or better,” very fast at prototyping, refactors, infra planning, Dockerization, tests, docs, etc.
  • Works especially well on greenfield tasks or small to medium codebases; less reliable when deeply entangled with large existing systems or modern idiomatic framework patterns.
  • Users mention workflows with IDE integrations and agents (Cursor, Cody, Aider, Sweep, custom bots) and note that semi‑autonomous PR agents are still mediocre (~25% success on SWE‑bench).

Reasoning, math, and consistency

  • Some say Claude is better at careful, step‑by‑step reasoning and ambiguity handling; others show math/physics prompts where Claude fails and GPT is correct.
  • A recurring theme is Claude 3.5’s improved consistency: fewer wild swings in quality once a good prompt style is found.

UX, pricing, and limits

  • Claude Pro’s opaque usage limits frustrate users; message caps are token‑dependent and capacity‑dependent, which feels unpredictable.
  • OpenAI’s consumer products also have caps and dynamic throttling; both sides are criticized for lack of transparency.
  • Projects (persistent context with files/instructions) and Artifacts are seen as major productivity features; some wish for repo integration and voice interfaces.
  • Account creation friction: phone-number requirement and blocking of Google Voice numbers turn some users away.

Safety, bans, and reliability

  • Some accounts are auto‑banned with little explanation; appeal flows exist but are slow or inconsistent.
  • Claude’s safety filters are stricter than GPT’s in some areas (e.g., code obfuscation), which some see as overreach.
  • Occasional dangerous suggestions (e.g., rm -rf on keyring data) show that safety and caution are still imperfect.

Broader impacts

  • Strong sense that modern LLMs dramatically accelerate experienced developers, especially on side projects.
  • Debate over whether this threatens software jobs or mainly raises the bar for developers who can direct and verify AI‑generated code.