News

Cursor launches Composer 2, its own frontier coding model at 50¢ per million tokens

Mar 19, 2026

Key Points

  • Cursor launches Composer 2, a proprietary coding model priced at 50¢ per million input tokens, roughly one-tenth the cost of Anthropic's Opus.
  • Composer 2 performs competitively on Cursor's internal CursorBench alongside GPT-5.4 and Claude Opus, reflecting the company's structural advantage in training data from its integrated code editor.
  • Cursor's recent shift to a "max mode" tier has frustrated developers by exhausting monthly credits in one to two days, driving customer switching despite the model's cost advantage.

Summary

Cursor has launched Composer 2, a proprietary coding model priced at 50¢ per million input tokens and $2.50 per million output tokens. That is roughly 10 times cheaper than Anthropic's Opus. The model is trained with continued pre-training and reinforcement learning to improve both quality and inference cost.

On Cursor's internal CursorBench, Composer 2 performs competitively alongside GPT-5.4 and above Claude Opus. The benchmark is narrowly designed for coding rather than general reasoning, which explains why Composer 2 excels where general-purpose models may not. The best models on CursorBench sit around 63–65% accuracy, indicating substantial room for improvement in the task itself.

The cost advantage is material for organizations managing monthly token budgets. Cursor faces mounting pressure from OpenAI and Anthropic. According to reports from enterprise customers, Cursor recently moved most models behind a "max mode" tier, causing developers to exhaust monthly credits in one to two days instead of spreading them across the month. This shift has driven customer frustration and switching.

Cursor's structural advantage runs through data. The company has operated a code editor with integrated agentic coding longer than OpenAI (with Codex) or Anthropic (with Claude Code), giving it access to real coding workflows and execution patterns for model training. That raw material—actual developer behavior at scale—is difficult for competitors to replicate quickly.

Small, task-specific models can outperform general ones on particular benchmarks and at lower cost, at least in the near term. But saturation matters. If a task like a simple if-statement classifier can achieve near-100% accuracy, further improvements from frontier models deliver diminishing returns. Coding is not yet at that ceiling, leaving room for both specialized and general models to compete.