Commentary

The age of orchestration: Gastown and multi-agent coding are the next step-change in developer productivity

Feb 4, 2026

Key Points

  • Steve Yeghi released Gastown on New Year's Day as an open-source orchestration layer for managing multiple AI agents, with tens of thousands already using it despite the 225,000-line codebase never being reviewed by its creator.
  • Orchestration solves a critical scaling problem: multiple parallel agent instances process work faster than a single LLM, enabling tasks like categorizing millions of receipts that would take impractical time sequentially.
  • Early users report jumping from 15,000 to 1 million lines of code per day, suggesting multi-agent systems could extend autonomous LLM task completion from hours to weeks, potentially requiring new benchmark categories by 2026.

Summary

Steve Yeghi released Gastown on New Year's Day as an orchestration layer for managing multiple AI agents in a single development environment. The project spans 225,000 lines of Go and is entirely vibe-coded, with Yeghi having never reviewed the codebase himself. Despite warnings that users need 10+ agents running to avoid chaos, tens of thousands of people are already using it. Some maintain dozens of accounts across AI labs to circumvent rate limits, with monthly bills routinely hitting thousands of dollars.

Gastown uses a Mad Max-themed taxonomy to manage complexity. Users act as overseers of a town containing multiple projects called rigs. A mayor agent serves as chief of staff and spins up ephemeral agents called pole cats that handle individual tasks. A witness agent helps unstuck workers. A deacon patrols for cleanup. Dogs handle maintenance. Crew members are longer-lived agents tied to specific projects and retain context across sessions, unlike pole cats that execute one task and disappear.

The system represents a natural evolution in the developer experience stack. Before, developers wrote code in text files and executed in terminals. IDEs brought code completion and REPLs. LLM chat windows arrived in IDEs next, asking permission before running tools. As model quality improved, developers trusted agents more and the IDE melted away into direct agent interaction. Yeghi notes that code-writing skill atrophies while code-reading skill improves. Developers now prompt agents and review outputs rather than write from scratch.

The most popular current workflow remains a single-agent CLI such as Claude Code, Codex, or Gemini CLI, though adoption of web and desktop frontends is still early. Gastown jumps several steps ahead into aggressive multi-agent orchestration.

Orchestration matters because model capability rises and cost falls. A single LLM instance cannot spawn multiple copies of itself for parallelized work. If a company needs to categorize millions of historical receipts, one LLM will finish eventually but slowly. Fifty instances running in parallel and managed by an orchestrator solve the problem correctly. This is the gap Gastown addresses.

Many developers are already building custom orchestrators themselves. Gastown's value is packaging this experience as open-source software. AI labs will likely need to deliver polished orchestration experiences soon. Startups will carve out niches such as Gastown for legal workflows. Over time, multi-agent complexity will be abstracted away for end users just as mixture-of-experts model internals are invisible to chat app users.

Early use cases are modest. Users are reimplementing open-source libraries in Rust and generating high volumes of code per day. One user joked that after switching to Gastown, they jumped from 15,000 lines per day to 1 million lines per day, enough to almost finish a Minesweeper clone. The underlying constraint remains genuine. Users still need real ideas about what to build and genuine skill at managing agent behavior when systems veer off course.

Gastown offers a glimpse of a next capability step-change in developer productivity. If the Meter benchmark, which measures how long a task an LLM can complete autonomously, used to cap at minutes and then hours, Gastown's properly configured setup could theoretically run weeks of engineering work unattended. Existing benchmarks such as Opus, Codex, and Gemini cannot be retrofitted to measure orchestration, so a new benchmark category may be needed.

Rough edges are being sanded down as the project matures. Orchestration may become the next hot keyword in AI developer tooling in 2026, with all major labs and a thousand startups pursuing variants.