Interview

Harjot Gill of CodeRabbit: GPT-5 nearly doubles code review performance, conversion to paid customers expected to jump

Aug 7, 2025 with Harjot Gill

Key Points

CodeRabbit's code review platform scores nearly twice as high on GPT-5 versus prior models on its hardest internal test cases, solving problems no earlier model could tackle.
The company will not raise prices despite the performance uplift, keeping customers at the same monthly rate as all competitors can access GPT-5 on identical terms.
CodeRabbit's free-to-paid conversion rate doubled to roughly 30% when o1 preview launched, and Gill expects GPT-5 to drive another significant conversion jump, though production-scale results may diverge from lab benchmarks.

Harjot Gill of CodeRabbit: GPT-5 nearly doubles code review performance, conversion to paid customers expected to jump

Summary

Harjot Gill, representing AI code review platform CodeRabbit, reports that GPT-5 scores nearly twice as high as GPT-4o, Claude Sonnet, and Claude Opus on the company's internal golden dataset of the most difficult pull request reviews — problems no prior model had been able to solve. CodeRabbit's product sits in a narrow category of genuinely reasoning-heavy AI applications, tasked with identifying race conditions, security vulnerabilities, and other complex code issues across developer pull requests.

The performance uplift will not translate into a price increase. Gill states explicitly there is no upsell plan, with customers receiving materially better output at the same monthly price point. That dynamic reflects a broader competitive reality: every rival can access GPT-5 on the same terms.

The more consequential business signal is conversion. When o1 preview launched, CodeRabbit's free-to-paid conversion rate doubled, reaching approximately 30%. Gill expects GPT-5 to drive another significant jump in paid customer conversion and a reduction in churn, though he cautions that lab benchmarks do not always survive contact with production-scale usage. False positive rates and hallucination frequency at scale remain under active observation.

The GPT-4 release cycle offers a cautionary reference point. Gill describes GPT-4 as a "Windows Vista moment" for CodeRabbit, a release where internal evals suggested parity but real-world performance regressed, causing a dip in conversion metrics. The o1 preview reversal that followed restored momentum and reset expectations for what reasoning model upgrades can do to a business built on inference quality.

You might also like...

Replit partners with Microsoft to bring vibe coding to Azure enterprise customers

Jul 8, 2025

Quiver AI raises $8.3M to generate SVGs through code — not tracing — using PhD-trained models

Mar 2, 2026

Lindy launches vibe coder that self-tests its output — the only vibe coding tool that checks its own work

Aug 27, 2025