Speed up responses with fast mode

Pricing & Value Perception

  • Fast mode is widely seen as extremely expensive: ~$30/MTok input and $150/MTok output, about 6× the normal Opus API price for ~2.5× speed.
  • Multiple users report burning $10–$100 of credit in minutes to a couple of hours under typical “serious dev” usage; some say their normal $200/month subscription would be gone in a day at fast-mode rates.
  • Confusion over the docs: fast mode is “available” to Pro/Max/Team/Enterprise, but usage is not included in plans and is billed only from extra-usage credit.

Speed & Developer Experience

  • Supporters argue that latency is a real bottleneck: waiting 1–2+ minutes per step forces context switching, increases mental load, and breaks “single-threaded” deep work.
  • Fast mode is seen as especially attractive for short, serial, blocking tasks (e.g., small merges, UI iteration, planning phases) where humans must wait for the agent.
  • Others say their bottleneck is reading, understanding, and validating AI-generated code, so faster output doesn’t help much.

Implementation & Infrastructure Speculation

  • Many assume this is primarily about prioritization/queue-skipping and retuning serving infrastructure: fewer concurrent users per GPU, smaller batches, higher per-user tokens/sec at lower overall throughput.
  • Alternatives raised: newer hardware (GB200/Blackwell, TPUs), speculative decoding, keeping KV cache in GPU memory; debate over how much each could contribute.
  • Some emphasize that large-scale serving always trades off throughput vs. per-request latency; “premium” speed simply chooses a different point on that curve.

Business Model & “Enshitification” Concerns

  • Strong worry that introducing a paid “fast lane” creates incentives to degrade the free/standard lane over time, analogized to airline “speedy boarding” or food-delivery premium tiers.
  • Others call this conspiratorial, arguing there’s no evidence of intentional slowdowns and intense competition would punish obvious degradation.

Desire for Slow/Cheap Modes & Alternatives

  • Many request a cheaper slow mode or easier integration of batch processing/spot-style pricing, especially for overnight/background agents.
  • Comparisons: OpenAI’s priority tier and batch API, Gemini 3 Pro’s better speed/price but weaker coding, and fast local/open models (Groq/Cerebras, large local GPUs) as eventual substitutes.