2025-08-12

Claude Sonnet 4 now supports 1M tokens of context

Initial Reactions & Availability

Many are excited about 1M context, especially for large codebases, long documents, and multi-hour “agentic” sessions.
Others note it’s API-only (and initially only on higher tiers / specific providers); web UI and non-Max Claude Code users don’t get it yet.
Some users report enabling 1M in Claude Code via undocumented headers/env vars; others see staggered rollout and confusion about what’s actually live.

Impact on Coding Workflows

Big theme: long context helps most at init time (load large repo, specs, prior discussion) but can hurt if you just “dump everything” and let the agent wander.
Multiple workflows are shared:
- Spec-first: write feature/requirements docs, then a plan, then implement in small stages, resetting context between stages.
- Using project files like CLAUDE.md, status.md, map.md, .plan to track decisions, progress, and give the model a compact, durable “cursor” into the codebase.
- Frequent commits and using tools (git worktrees, MCP servers, repomaps, Serena, etc.) so the model searches instead of loading entire files.
Some prefer manual chat + editor over full agents; others lean heavily on Claude Code / Cursor.

Context Rot, Retrieval & Limits

Several link to “context rot” research: performance often degrades as context grows; needle-in-haystack benchmarks are not representative of real reasoning.
Reports that models (including past Gemini long-context versions) can technically accept huge inputs but start “forgetting” or ignoring earlier parts after tens of thousands of tokens.
Strong sentiment that abstractions + retrieval (RAG, language servers, outlines, repomaps) matter more than raw context size.

Claude vs Competitors & Pricing

Gemini 2.5 Pro is widely praised for long-context code and document understanding, and is cheaper per token, but availability and QoS are pain points.
Claude is preferred by many for safety, consistency, prose quality, and Claude Code’s workflow; others find Gemini or GPT‑5 superior on their stacks.
Anthropic’s 1M pricing is seen as steep but defensible for high-value use; caching discounts matter. Some fear surprise bills if agents routinely sit in the “expensive band”.

Productivity & “Agentic AI” Debate

Experiences are polarized: some claim 2–3×+ productivity on web/full‑stack work; others say agentic tools are net negative, citing thrash, hallucinations, and review overhead.
Nuanced consensus:
- Best gains come on new tech, boilerplate-heavy work, or for juniors.
- Senior devs in complex, bespoke systems often see smaller or negative returns.
- Technique (planning, tight scopes, context hygiene) matters at least as much as model choice.

Related topics