Claude Sonnet 4 now supports 1M tokens of context
Initial Reactions & Availability
- Many are excited about 1M context, especially for large codebases, long documents, and multi-hour “agentic” sessions.
- Others note it’s API-only (and initially only on higher tiers / specific providers); web UI and non-Max Claude Code users don’t get it yet.
- Some users report enabling 1M in Claude Code via undocumented headers/env vars; others see staggered rollout and confusion about what’s actually live.
Impact on Coding Workflows
- Big theme: long context helps most at init time (load large repo, specs, prior discussion) but can hurt if you just “dump everything” and let the agent wander.
- Multiple workflows are shared:
- Spec-first: write feature/requirements docs, then a plan, then implement in small stages, resetting context between stages.
- Using project files like
CLAUDE.md,status.md,map.md,.planto track decisions, progress, and give the model a compact, durable “cursor” into the codebase. - Frequent commits and using tools (git worktrees, MCP servers, repomaps, Serena, etc.) so the model searches instead of loading entire files.
- Some prefer manual chat + editor over full agents; others lean heavily on Claude Code / Cursor.
Context Rot, Retrieval & Limits
- Several link to “context rot” research: performance often degrades as context grows; needle-in-haystack benchmarks are not representative of real reasoning.
- Reports that models (including past Gemini long-context versions) can technically accept huge inputs but start “forgetting” or ignoring earlier parts after tens of thousands of tokens.
- Strong sentiment that abstractions + retrieval (RAG, language servers, outlines, repomaps) matter more than raw context size.
Claude vs Competitors & Pricing
- Gemini 2.5 Pro is widely praised for long-context code and document understanding, and is cheaper per token, but availability and QoS are pain points.
- Claude is preferred by many for safety, consistency, prose quality, and Claude Code’s workflow; others find Gemini or GPT‑5 superior on their stacks.
- Anthropic’s 1M pricing is seen as steep but defensible for high-value use; caching discounts matter. Some fear surprise bills if agents routinely sit in the “expensive band”.
Productivity & “Agentic AI” Debate
- Experiences are polarized: some claim 2–3×+ productivity on web/full‑stack work; others say agentic tools are net negative, citing thrash, hallucinations, and review overhead.
- Nuanced consensus:
- Best gains come on new tech, boilerplate-heavy work, or for juniors.
- Senior devs in complex, bespoke systems often see smaller or negative returns.
- Technique (planning, tight scopes, context hygiene) matters at least as much as model choice.