Composer: Building a fast frontier model with RL
Model performance & comparisons
- Many commenters want explicit head‑to‑head numbers vs Sonnet 4.5 and GPT‑5, not the “Best Frontier” aggregate chart.
- From the post and comments: Composer underperforms top frontier models in raw capability but aims to be ~4x faster at similar quality.
- Some users say Composer feels “quite good” or even better than GPT‑5 Codex for certain tasks; others find it clearly below Sonnet 4.5 or GPT‑5‑high and quickly switch back.
Speed vs intelligence tradeoff
- Thread repeatedly splits developers into two camps:
- Those who want autonomous, longer‑running agents: prioritize raw intelligence and planning (often prefer Claude / GPT‑5).
- Those who prefer tight, interactive collaboration: prioritize latency and iteration speed (more open to Composer).
- Several users say model speed is not their bottleneck; “wrestling it to get the right output” is. Others argue “good enough but a lot faster” is ideal, as you can correct a fast model more often.
User experiences & reliability
- Strong praise for Cursor’s overall UX, especially compared with Copilot, Claude Code, Gemini CLI, Cline, etc.
- Counter‑reports of major reliability issues (requests hanging, failed commands, crashing on Cursor 2.0), especially on Windows and in some networks; some say Claude Code feels “night and day” more reliable.
- Cursor staff claim recent, substantial performance improvements and urge people to retry.
Tab completion & workflows
- Cursor’s tab completion is widely praised as best‑in‑class and a key differentiator; some users switched back from other editors just for this.
- A minority find multi‑line suggestions distracting or overly aggressive, preferring more conservative behavior like IntelliJ’s.
- There’s debate between “tab‑driven, human‑in‑control” workflows vs running agents (e.g., Claude Code) almost autonomously in the background.
Model training, data & transparency
- Users ask whether Composer is trained on Cursor user data; answers in the thread are conflicting and non‑authoritative.
- An ML researcher from Cursor emphasizes RL post‑training for agentic behavior but avoids naming the base model or fully detailing training data.
- One external commenter claims Composer and another tool are RL‑tuned on GLM‑4.5/4.6; this is not confirmed by Cursor.
- Many criticize opaque benchmarking: internal “Cursor Bench” is not public, results are aggregated across competitor models, and axis labels/metrics are sparse.
- Others argue internal user signals (accept/reject, task success) matter more than public benchmarks, though some still want open or third‑party evaluations.
Pricing, billing & positioning
- Composer is priced inside Cursor similarly to GPT‑5 and Gemini 2.5 Pro, which raises the question of why to choose it over “Auto” or named frontier models.
- Several complain about confusing and frequently changing Cursor billing and want clearer, prominent pricing.
- Overall sentiment: enthusiasm about Cursor’s product velocity and Composer’s speed, tempered by skepticism over transparency, reliability, and value relative to leading frontier models.