2025-10-29

Composer: Building a fast frontier model with RL

Model performance & comparisons

Many commenters want explicit head‑to‑head numbers vs Sonnet 4.5 and GPT‑5, not the “Best Frontier” aggregate chart.
From the post and comments: Composer underperforms top frontier models in raw capability but aims to be ~4x faster at similar quality.
Some users say Composer feels “quite good” or even better than GPT‑5 Codex for certain tasks; others find it clearly below Sonnet 4.5 or GPT‑5‑high and quickly switch back.

Speed vs intelligence tradeoff

Thread repeatedly splits developers into two camps:
- Those who want autonomous, longer‑running agents: prioritize raw intelligence and planning (often prefer Claude / GPT‑5).
- Those who prefer tight, interactive collaboration: prioritize latency and iteration speed (more open to Composer).
Several users say model speed is not their bottleneck; “wrestling it to get the right output” is. Others argue “good enough but a lot faster” is ideal, as you can correct a fast model more often.

User experiences & reliability

Strong praise for Cursor’s overall UX, especially compared with Copilot, Claude Code, Gemini CLI, Cline, etc.
Counter‑reports of major reliability issues (requests hanging, failed commands, crashing on Cursor 2.0), especially on Windows and in some networks; some say Claude Code feels “night and day” more reliable.
Cursor staff claim recent, substantial performance improvements and urge people to retry.

Tab completion & workflows

Cursor’s tab completion is widely praised as best‑in‑class and a key differentiator; some users switched back from other editors just for this.
A minority find multi‑line suggestions distracting or overly aggressive, preferring more conservative behavior like IntelliJ’s.
There’s debate between “tab‑driven, human‑in‑control” workflows vs running agents (e.g., Claude Code) almost autonomously in the background.

Model training, data & transparency

Users ask whether Composer is trained on Cursor user data; answers in the thread are conflicting and non‑authoritative.
An ML researcher from Cursor emphasizes RL post‑training for agentic behavior but avoids naming the base model or fully detailing training data.
One external commenter claims Composer and another tool are RL‑tuned on GLM‑4.5/4.6; this is not confirmed by Cursor.
Many criticize opaque benchmarking: internal “Cursor Bench” is not public, results are aggregated across competitor models, and axis labels/metrics are sparse.
Others argue internal user signals (accept/reject, task success) matter more than public benchmarks, though some still want open or third‑party evaluations.

Pricing, billing & positioning

Composer is priced inside Cursor similarly to GPT‑5 and Gemini 2.5 Pro, which raises the question of why to choose it over “Auto” or named frontier models.
Several complain about confusing and frequently changing Cursor billing and want clearer, prominent pricing.
Overall sentiment: enthusiasm about Cursor’s product velocity and Composer’s speed, tempered by skepticism over transparency, reliability, and value relative to leading frontier models.

Related topics