Riley Tomasek of Charlie Labs: 98% of merged code written by AI agent in August
Aug 7, 2025 with Riley Tomasek
Key Points
- Charlie Labs reports that 98% of merged code in its August codebase came from Charlie, its TypeScript-focused AI agent, measured by full pull requests reviewed through GitHub.
- Early access to GPT-5 shows material performance gains over o3 across Charlie's eval suite, which tests GitHub-native workflows including issue-to-PR generation and code quality scoring.
- Tomasek argues capability, not token cost, is the binding constraint for Charlie Labs, shifting hiring demand toward systems-level thinking and longer planning horizons rather than raw coding ability.
Summary
Riley Tomasek, founder of Charlie Labs, reports that 98% of code merged into Charlie's own codebase in August 2025 was written by Charlie, the company's TypeScript-focused AI coding agent. That figure covers full pull requests reviewed and approved through GitHub, not autocomplete suggestions or inline IDE assistance — a distinction Tomasek emphasizes as meaningful.
Charlie Labs has had access to GPT-5 for two to three weeks as an early adopter. Internal evals show GPT-5 performing materially better than o3, which itself outperformed all prior models. The company's eval suite is calibrated toward hard-to-very-hard tasks, with easy benchmarks abandoned as no longer useful at current capability levels. Tasks mirror a GitHub-native workflow: an issue triggers a comment prompting Charlie to open a PR, which is then scored on correctness against a known solution, code quality, testability, and documentation quality.
Charlie operates as a workspace-integrated agent rather than a standalone tool. It joins a team's GitHub, Slack, and Linear environments and participates like a human contributor, including reviewing PRs from human engineers. The product targets modern web stacks, primarily Next.js and TypeScript back-end applications, and is priced per PR or per commit — value-based pricing benchmarked against the cost of human engineering output.
On pricing, Tomasek is direct: token cost reduction is not the binding constraint for Charlie Labs. The limiting factor is capability, not economics. That position insulates the business somewhat from inference price wars but concentrates risk on model performance trajectory.
On hiring, Tomasek says the skill set shifting in demand is systems-level thinking and longer planning horizons, not raw coding ability. Engineers who adapt to AI-assisted workflows remain viable; those optimizing for hand-written code proficiency are, in his assessment, on the wrong path.