2025-02-24

Claude 3.7 Sonnet and Claude Code

Feature convergence & reasoning trend

Commenters note rapid copycatting: DeepSeek popularized visible “thinking,” xAI and now Anthropic follow with similar visual/reasoning modes.
Debate on whether reasoning is just a “meta-prompt bolt‑on” vs requiring RL and architectural changes; consensus in thread: serious reasoning needs RL and specific training, not just prompting.
Some see current releases as evolutionary (small steps since o1/R1), others argue going from GPT‑2‑level chat to IMO medals and agentic coding in <10 years is a massive shift.

Coding focus & Claude Code

Broad agreement that coding has been Claude’s comparative strength; many already preferred Sonnet 3.5 over GPT‑4o for real‑world codebases.
Claude Code (CLI agent) is seen as a smart way to be editor‑agnostic and “bring the model to the terminal,” though some would prefer IDE‑native plugins.
Early users report very strong capabilities (multi‑hour refactors, big speedups, complex scaffolding) but also rough edges: patch errors, bash commands hanging, incomplete long outputs, and no persistent history between accounts.
Anthropic staff say Claude Code intentionally exposes raw tool errors and model quirks; it currently relies on agentic search (grep‑style tools) rather than vector RAG for code.

Model behavior & UX preferences

Many like Claude’s code skills but dislike its eagerness to emit code when only high‑level discussion is desired; extensive use of custom instructions and “architect first” workflows to mitigate.
Some report better results with minimal context than with heavy project contexts; suspicion that long context can hurt answer quality.
3.7 is perceived by some as “smarter but more aggressive,” occasionally ignoring instructions, looping, or overcomplicating solutions.

Costs, limits & billing concerns

Pricing is a major theme: Claude 3.7 and Claude Code can burn through dollars quickly; several users hit ~$1 after minutes or $5–10 per dev per day, with intensive sessions hitting “$100/hour” as Anthropic’s own blog notes.
Cache reads help a lot in Claude Code, but people still worry about unpredictable bills and want per‑key spend caps, flat‑rate “Ultimate” tiers, or more generous Pro limits.
Persistent frustration with tight web‑UI rate limits; heavy users routinely hit caps mid‑debug and fall back to other models.

Comparisons with other models & benchmarks

Reports are mixed:
- Some claim Grok 3 and o1/o3‑mini beat earlier Claude models on complex algorithms; others say they’ve never seen o1 solve something Claude 3.5 couldn’t.
- New Aider benchmarks put 3.7 Sonnet (no thinking) at the top among non‑reasoning coders, and 3.7‑thinking at SOTA with a large thinking budget—though DeepSeek‑R1+Claude mixtures are very competitive on cost.
Several note benchmarks rarely reflect their “vibes”: Claude often “feels right” in large codebases even when charts put it behind.

Open vs closed, privacy & hosting

Skepticism toward closed APIs: no way to prove inputs aren’t used for training; some insist only open‑weights or self‑hosted setups are truly trustworthy.
Others point to contractual guarantees, use via Bedrock/Vertex, and argue they’re sufficient for most businesses.
Discussion on Meta and open‑weights models undercutting economics; expectation that general‑purpose LLMs will commoditize and inference prices trend toward raw compute.

Capabilities, creativity & humor

Multiple users are impressed by 3.7’s SVG generation and UI design quality, and by complex math/physics/engineering derivations on first try.
A side project (“HN Wrapped”) that uses Claude to roast Hacker News profiles is widely praised as genuinely funny—some see this as evidence of a step‑change in LLM humor and “feel” compared to prior models.

Economic & career anxieties

Long subthread on whether AI will erode software jobs: some foresee massive disruption and advise becoming “T‑shaped” (broad stack + deep niche) and using AI as a force multiplier; others think edge‑case complexity, legacy systems, and real‑world ambiguity will keep good engineers in demand.
Students express pessimism about picking CS just as AI coding tools accelerate; responses range from “learn to code anyway, you must be able to evaluate AI output” to suggestions to pivot toward products, domain expertise, or starting niche businesses.

Related topics