Claude 3.5 Sonnet
Overall impressions & model comparisons
- Many commenters find Claude 3.5 Sonnet extremely strong, often preferring it over GPT‑4/4o for coding, data-heavy tasks, and “human-like” language.
- Others report the opposite: GPT‑4o feels more capable, especially for assistant-style reasoning and calculus/physics; experiences are clearly mixed.
- Some see Sonnet as slightly ahead of GPT‑4o on coding and extraction from long documents; Gemini is mentioned for much larger context windows.
- Benchmarks are viewed skeptically: several note that leaderboard scores don’t match their day‑to‑day experience.
Coding ability & tools
- Strong praise for Sonnet 3.5 as a coding assistant: “junior engineer or better,” very fast at prototyping, refactors, infra planning, Dockerization, tests, docs, etc.
- Works especially well on greenfield tasks or small to medium codebases; less reliable when deeply entangled with large existing systems or modern idiomatic framework patterns.
- Users mention workflows with IDE integrations and agents (Cursor, Cody, Aider, Sweep, custom bots) and note that semi‑autonomous PR agents are still mediocre (~25% success on SWE‑bench).
Reasoning, math, and consistency
- Some say Claude is better at careful, step‑by‑step reasoning and ambiguity handling; others show math/physics prompts where Claude fails and GPT is correct.
- A recurring theme is Claude 3.5’s improved consistency: fewer wild swings in quality once a good prompt style is found.
UX, pricing, and limits
- Claude Pro’s opaque usage limits frustrate users; message caps are token‑dependent and capacity‑dependent, which feels unpredictable.
- OpenAI’s consumer products also have caps and dynamic throttling; both sides are criticized for lack of transparency.
- Projects (persistent context with files/instructions) and Artifacts are seen as major productivity features; some wish for repo integration and voice interfaces.
- Account creation friction: phone-number requirement and blocking of Google Voice numbers turn some users away.
Safety, bans, and reliability
- Some accounts are auto‑banned with little explanation; appeal flows exist but are slow or inconsistent.
- Claude’s safety filters are stricter than GPT’s in some areas (e.g., code obfuscation), which some see as overreach.
- Occasional dangerous suggestions (e.g.,
rm -rfon keyring data) show that safety and caution are still imperfect.
Broader impacts
- Strong sense that modern LLMs dramatically accelerate experienced developers, especially on side projects.
- Debate over whether this threatens software jobs or mainly raises the bar for developers who can direct and verify AI‑generated code.