Anthropic: Claude Wrote 80% of Merged Code

As of May 2026, more than 80 percent of the code merged into Anthropic’s production codebase was written by Claude, not by human engineers. The company disclosed the milestone in a June 4, 2026 report titled “When AI builds itself,” marking a dramatic shift from the low single digits before Claude Code launched in research preview in February 2025. Engineers in the second quarter of 2026 are shipping an average of eight times as much code per day as they did in 2024.

Team celebrating Claude AI code launch countdown at tech office with monitors and collaboration workspace.

From Assistant to Author: How Claude Code Changed Anthropic’s Workflow

Before Claude Code launched in research preview in February 2025, the share of Claude-authored merged code was in the low single digits. The 80 percent figure represents a fundamental transformation in how Anthropic builds software, with the AI moving from suggesting code snippets to authoring entire production systems.

Leadership estimates the total share, including scripts and experimental code, at more than 90 percent. One employee is quoted saying, “it’s now been ~5 months since I last wrote any code myself.” The rapid acceleration in AI-authored code has shifted the primary engineering role from writing software to directing the AI, reviewing generated changes, and deciding which projects are worth building.

Engineers remain inside the loop. They choose the work, review generated changes, and decide what is merged. Human oversight remains the bottleneck in how fast new AI systems can be developed, with engineers responsible for setting goals, making architectural decisions, and maintaining production risk management.

Performance Gains on Complex Engineering Tasks

On highly complex, open-ended engineering problems where clear specifications are initially absent, Claude’s success rate climbed to 76 percent in May 2026, a 50-point increase in a six-month window. This steep improvement curve explains why routine implementation work is increasingly shifting to AI systems.

In isolated optimization benchmarks, where models are tasked with accelerating AI model training code, Anthropic’s internal Mythos Preview model achieved a 52x speedup. For comparison, a skilled human developer typically requires four to eight hours of manual refactoring to achieve a mere 4x speedup on the exact same codebase.

In another example, Claude delivered more than 800 fixes in April 2026 that cut a class of API errors by a factor of 1,000. A human would have needed four years for that work, according to the engineer in charge. The example demonstrates Claude’s ability to handle massive, context-heavy technical debt that would be cognitively overwhelming for human developers.

Code Quality and Review Processes

On code quality, the report states, “Claude-written code was somewhat worse than human-written code at Anthropic in late 2025, is roughly at parity today, and we expect it to be strictly better within the year.” The company has implemented automated review systems to catch defects before code reaches production.

An automated Claude reviewer would have caught about a third of the bugs behind past incidents on claude.ai before they hit production, according to a retrospective analysis. Despite the high success rates, Anthropic acknowledges that code volume and reliability must be judged separately, with validation becoming the primary constraint as AI-written changes enter production repositories.

Productivity Metrics and Caveats

The eightfold increase “is almost certainly an overstatement of the true productivity gain.” In an internal survey from March 2026 with 130 employees, the median estimate pegged the output boost from Mythos Preview at 4x. Anthropic notes that lines of code are an imperfect metric, as increased volume does not automatically translate to higher quality or better outcomes.

Claude Mythos Preview suggested the better next step in 64 percent of those cases, up from 51 percent for Claude Opus 4.5 six months earlier. The improvement suggests AI systems are increasingly capable of making the kinds of judgment calls that research work depends on.

Where Humans Still Win

The critical bottleneck, according to Anthropic, is what the company calls “research taste:” the ability to pick the right problems and spot dead ends early. “The comparative advantage of humans as of right now is still in seeing the bigger picture and thinking beyond the confines of the immediate task,” one employee is quoted as saying.

The gap between executing experiments and choosing which experiments matter remains the line separating current AI systems from ones that could fully design their own successors. While Claude can write code, run experiments, test hypotheses, and analyze results, humans retain the edge in strategic decision-making and architectural vision.

Implications for Recursive Self-Improvement

Anthropic published “When AI builds itself” on June 4, 2026 and said Claude authored more than 80 percent of the code merged into Anthropic’s own codebase in May 2026, then argued the world needs a verifiable, multi-country option to slow frontier AI before recursive self-improvement stops being theoretical.

Researchers Marina Favaro and Jack Clark wrote that a worldwide frontier slowdown “would likely be a good thing,” but only if US and Chinese labs (and others near the frontier) stop together under rules outsiders can verify. Anthropic said a pause would only work if multiple frontier labs in multiple countries stopped together under verifiable rules. It did not commit to unilaterally halting development.

Key Facts

As of May 2026, more than 80 percent of the code merged into Anthropic’s production codebase was written by Claude.
Before Claude Code launched in research preview in February 2025, the share of Claude-authored merged code was in the low single digits.
Engineers in Q2 2026 are shipping an average of eight times as much code per day as they did in 2024.
Claude’s success rate on highly complex, open-ended engineering problems climbed to 76 percent in May 2026, a 50-point increase in six months.
Anthropic’s internal Mythos Preview model achieved a 52x speedup on AI model training code optimization benchmarks.
Claude delivered more than 800 fixes in April 2026 that reduced a class of API errors by a factor of 1,000.