Interview

Mike Krieger on building Claude Sonnet 4.5: Instagram lessons, agentic UX, and focus as a superpower

Sep 29, 2025 with Mike Krieger

Key Points

  • Anthropic's Claude Sonnet 4.5 exceeds its prior flagship model at one-fifth the cost, with tuning focused on agentic balance: Claude 3.5 was too aggressive at code refactoring, Claude 4 too passive, Sonnet 4.5 targets steerability as explicit design goal.
  • Sustained task coherence emerged as Anthropic's core agentic metric; Sonnet 4.5 built a fully functional replica of claude.ai including login and message-sending, while some enterprise customers ran 30-hour autonomous tasks during testing.
  • Krieger applies Instagram's execution playbook directly: performance optimization through upfront computation, and output quality over autonomy, ensuring Claude-generated PowerPoints and spreadsheets are genuinely good on first pass rather than merely functional.
Mike Krieger on building Claude Sonnet 4.5: Instagram lessons, agentic UX, and focus as a superpower

Summary

Mike Krieger, co-founder of Instagram and now a product leader at Anthropic, used the September 29 launch of Claude Sonnet 4.5 to articulate how the company is thinking about model behavior, agentic capability, and long-term product focus — while drawing direct lines between his Instagram playbook and what Anthropic is building now.

The Model Itself

Sonnet 4.5 is positioned as Anthropic's most capable generally available model, exceeding Claude Opus in performance at one-fifth the cost — a combination Krieger frames as critical for scaling to enterprise demand. A central tuning challenge was calibrating eagerness: Claude 3.7 Sonnet was too aggressive, refactoring entire codebases when asked to change a button color; Claude 4 overcorrected toward passivity, triggering internal monitoring flags when users repeatedly prompted it to "keep going." Sonnet 4.5 is the attempt to find the right balance, with steerability as the explicit design goal.

Krieger also flags humor as a measurable, if informal, quality signal. Sonnet 4.5 has been deployed in Anthropic's internal Slack channels, including a general social channel, where employees stayed up late engaging with it over the launch weekend. The model makes back-references, responds to in-jokes, and delivers contextual roasts — behavior Krieger describes as a qualitative leap over prior versions. He acknowledges a formal "water cooler bench" eval does not yet exist.

Agentic Duration as the Core Metric

Krieger is direct that sustained coherence — how long a model can work without losing context or requiring human correction — is among the most important metrics Anthropic tracks. He illustrates the improvement with a time-lapse exercise: each Claude version was asked to build a replica of claude.ai. Versions 1 through 3 produced nothing functional. Claude 3.5 rendered a screen but had no login. Claude 4 allowed login but message-sending failed. Sonnet 4.5 not only built a working product but replicated Anthropic's Artifacts feature within its prototype. Anthropic engineers now routinely run three or four concurrent Claude Code terminal sessions simultaneously — a workflow only viable if the model maintains coherence without constant interruption.

Krieger notes that some enterprise customers ran 30-hour autonomous tasks during testing. Even if most use cases never approach that duration, the capacity for it builds the trust necessary for users to delegate progressively longer work.

Instagram Lessons Applied

Krieger draws two explicit parallels to Instagram. The first is performance optimization: Instagram pre-uploaded photos to servers the moment a user began selecting a filter in 2009, so that hitting "post" was near-instantaneous. Boris, the Anthropic engineer who initiated Claude Code, also came from Instagram and brought similar performance thinking to the product. The second parallel is output quality over autonomy. Instagram's core insight was making mediocre phone cameras produce photos worth sharing. Krieger is applying the same logic to Claude's document and presentation output — pushing the team to ensure that PowerPoints and spreadsheets generated by the model are genuinely good on first pass, not just technically complete.

He also draws a retention lesson from Artifact, the AI news product he built after Instagram. Artifact's retention was strong among users who stayed, but activation was the failure point — the product only shone once it knew the user well, and that upfront investment was too much to ask of newcomers. His proposed fix for Claude is to let the model itself have the onboarding conversation: proactively asking users to connect Google Drive, iOS Calendar, or Maps rather than assuming they have already done so.

New Products Launching Alongside 4.5

Anthropicis simultaneously launching the Claude Agent SDK — the same infrastructure Claude Code runs on — as a public tool. Krieger notes that Anthropic rebuilt its advanced research feature on top of this SDK, replacing thousands of lines of custom scaffolding with essentially a prompt and a set of tools. He frames this as a practical expression of the "bitter lesson" in AI: let the model do as much as possible rather than over-specifying the pipeline.

A second launch, called "Imagine with Claude," allows users to describe software and have Claude render a fully interactive simulation — not by writing backend code separately, but by acting as the backend in real time. A demonstrated use case: asking Claude to simulate Steve Jobs' desktop the week before the 2007 iPhone launch, then clicking into a Keynote file that Claude generates on the fly. Krieger is careful not to call this mass-market ready, but positions it as a signal of where fully generative, dynamically rendered UI is heading.

The Claude mobile app also recently gained agentic multi-step capability, allowing users to chain actions across calendar, maps, local search, and messaging within a single instruction. Message sending still requires manual confirmation, but Krieger sees this as the entry point for consumer AI adoption — contingent on deeper OS-level partnerships with device makers, particularly given iOS restrictions.

Focus as Competitive Strategy

Krieger is explicit that Anthropic's deliberate narrowness — prioritizing models that think longer, act agentically, write and execute code, and maintain state — has allowed it to move faster than larger labs. Image generation is not on the core roadmap; Anthropic is leaning on partnerships and MCP integrations for that capability. He draws a direct comparison to Instagram's refusal to build an iPad app or a web product for years, arguing that coordination costs across platforms dilute execution quality. The Instagram iPad app shipped only recently.

On LM Arena as of September 29, Anthropic's share of "best model" votes jumped from 1.8% to over 6% following the Sonnet 4.5 launch, though Google remains the leader on that benchmark.

SaaS and the Agent-to-Agent Future

Krieger's view on the enterprise software disruption question is that the risk for SaaS providers is becoming pure data repositories as agents replace direct user interfaces. His counter-argument is agent-to-agent collaboration. He cites a demo in which a Claude agent and a Salesforce-based agent exchanged multiple back-and-forth messages — negotiating the tone of a customer communication by referencing prior message history — with minimal human involvement, producing a send-ready output. His read is that SaaS platforms that build genuine reasoning value on top of their data, rather than just serving as storage layers, will remain relevant participants in agentic workflows rather than being routed around them.