Harjot Gill of CodeRabbit: GPT-5 nearly doubles code review performance, conversion to paid customers expected to jump
Aug 7, 2025 with Harjot Gill
Key Points
- CodeRabbit's code review platform scores nearly twice as high on GPT-5 versus prior models on its hardest internal test cases, solving problems no earlier model could tackle.
- The company will not raise prices despite the performance uplift, keeping customers at the same monthly rate as all competitors can access GPT-5 on identical terms.
- CodeRabbit's free-to-paid conversion rate doubled to roughly 30% when o1 preview launched, and Gill expects GPT-5 to drive another significant conversion jump, though production-scale results may diverge from lab benchmarks.
Summary
Harjot Gill, representing AI code review platform CodeRabbit, reports that GPT-5 scores nearly twice as high as GPT-4o, Claude Sonnet, and Claude Opus on the company's internal golden dataset of the most difficult pull request reviews — problems no prior model had been able to solve. CodeRabbit's product sits in a narrow category of genuinely reasoning-heavy AI applications, tasked with identifying race conditions, security vulnerabilities, and other complex code issues across developer pull requests.
The performance uplift will not translate into a price increase. Gill states explicitly there is no upsell plan, with customers receiving materially better output at the same monthly price point. That dynamic reflects a broader competitive reality: every rival can access GPT-5 on the same terms.
The more consequential business signal is conversion. When o1 preview launched, CodeRabbit's free-to-paid conversion rate doubled, reaching approximately 30%. Gill expects GPT-5 to drive another significant jump in paid customer conversion and a reduction in churn, though he cautions that lab benchmarks do not always survive contact with production-scale usage. False positive rates and hallucination frequency at scale remain under active observation.
The GPT-4 release cycle offers a cautionary reference point. Gill describes GPT-4 as a "Windows Vista moment" for CodeRabbit, a release where internal evals suggested parity but real-world performance regressed, causing a dip in conversion metrics. The o1 preview reversal that followed restored momentum and reset expectations for what reasoning model upgrades can do to a business built on inference quality.