Composer: Building a fast frontier model with RL

Model performance & comparisons

  • Many commenters want explicit head‑to‑head numbers vs Sonnet 4.5 and GPT‑5, not the “Best Frontier” aggregate chart.
  • From the post and comments: Composer underperforms top frontier models in raw capability but aims to be ~4x faster at similar quality.
  • Some users say Composer feels “quite good” or even better than GPT‑5 Codex for certain tasks; others find it clearly below Sonnet 4.5 or GPT‑5‑high and quickly switch back.

Speed vs intelligence tradeoff

  • Thread repeatedly splits developers into two camps:
    • Those who want autonomous, longer‑running agents: prioritize raw intelligence and planning (often prefer Claude / GPT‑5).
    • Those who prefer tight, interactive collaboration: prioritize latency and iteration speed (more open to Composer).
  • Several users say model speed is not their bottleneck; “wrestling it to get the right output” is. Others argue “good enough but a lot faster” is ideal, as you can correct a fast model more often.

User experiences & reliability

  • Strong praise for Cursor’s overall UX, especially compared with Copilot, Claude Code, Gemini CLI, Cline, etc.
  • Counter‑reports of major reliability issues (requests hanging, failed commands, crashing on Cursor 2.0), especially on Windows and in some networks; some say Claude Code feels “night and day” more reliable.
  • Cursor staff claim recent, substantial performance improvements and urge people to retry.

Tab completion & workflows

  • Cursor’s tab completion is widely praised as best‑in‑class and a key differentiator; some users switched back from other editors just for this.
  • A minority find multi‑line suggestions distracting or overly aggressive, preferring more conservative behavior like IntelliJ’s.
  • There’s debate between “tab‑driven, human‑in‑control” workflows vs running agents (e.g., Claude Code) almost autonomously in the background.

Model training, data & transparency

  • Users ask whether Composer is trained on Cursor user data; answers in the thread are conflicting and non‑authoritative.
  • An ML researcher from Cursor emphasizes RL post‑training for agentic behavior but avoids naming the base model or fully detailing training data.
  • One external commenter claims Composer and another tool are RL‑tuned on GLM‑4.5/4.6; this is not confirmed by Cursor.
  • Many criticize opaque benchmarking: internal “Cursor Bench” is not public, results are aggregated across competitor models, and axis labels/metrics are sparse.
  • Others argue internal user signals (accept/reject, task success) matter more than public benchmarks, though some still want open or third‑party evaluations.

Pricing, billing & positioning

  • Composer is priced inside Cursor similarly to GPT‑5 and Gemini 2.5 Pro, which raises the question of why to choose it over “Auto” or named frontier models.
  • Several complain about confusing and frequently changing Cursor billing and want clearer, prominent pricing.
  • Overall sentiment: enthusiasm about Cursor’s product velocity and Composer’s speed, tempered by skepticism over transparency, reliability, and value relative to leading frontier models.