AI Wrote the Code. Now It's Checking Its Own Work.

Anthropic's Code Review for Claude Code uses parallel AI agents to catch logic bugs in pull requests automatically, before human reviewers get involved.

Agentic coding tools have reshuffled how software gets built. Engineers who once spent hours on a single feature are shipping pull requests in minutes. The output is real and the speed is real, but something else is also real: every one of those PRs still needs a human to verify it. That workload hasn't shrunk. It's grown.

Anthropic launched Code Review inside Claude Code on March 9, 2026, taking a direct shot at that problem. It's a multi-agent system that audits pull requests automatically before a human reviewer ever opens them. Teams and Enterprise users can access it now through the Claude Code web interface, with admins able to enable it per repository.

PR Volume Is Outpacing Human Review

Anyone using Claude Code, Cursor, Codex, or similar tools over the past year has felt this. One engineer, one prompt, one PR. It takes almost no time to generate something that looks reasonable and passes a surface read. The reviewer on the other end is the one who has to actually trace the logic, check edge cases, and figure out whether anything quietly breaks.

Cat Wu, head of product for Claude Code at Anthropic, described it this way: the code isn't necessarily bad, there's just too much of it. Review has become the bottleneck that AI-assisted coding created, and Code Review is built to absorb a chunk of that work before it reaches the human queue.

How It Works

When a pull request is opened on an enabled repository, Code Review spins up a team of agents that run in parallel, each focused on a different class of potential errors. They finish, drop a comment with their findings, and suggest fixes for anything they caught. They do not approve PRs. Approval stays with the engineer.

The focus is deliberately narrow: logic errors, not style. Anthropic found that automated reviews loaded with formatting nitpicks and variable naming opinions erode developer trust fast. When the only things getting flagged are real bugs, engineers actually pay attention. A high false positive rate turns any review tool into noise.

The agents also don't limit themselves to the changed files. They can traverse the full codebase, which matters when a change in one file has downstream effects on other files that interact with it in ways the diff alone wouldn't reveal. That's a different category of review than most static analysis tools offer.

What Anthropic's Own Numbers Show

Anthropic ran an internal version of this before releasing it publicly. Before Code Review, 16 percent of pull requests got substantive review comments. After enabling it, that number hit 54 percent. On large PRs with more than 1,000 lines changed, the system catches bugs in 84 percent of cases, averaging 7.5 issues per review.

Developers marked fewer than 1 percent of comments as incorrect. Wu said engineers at Anthropic now expect to see Code Review feedback on their PRs and notice when it's missing.

Speed and Cost

Reviews average around 20 minutes, scaling with complexity. Simple PRs get a lighter pass. Large, complex ones bring in more agents and take longer. Anthropic is not trying to compete on speed here. The explicit tradeoff is thoroughness over turnaround time.

Cost is billed by token usage. A typical review runs $15 to $25. Admins get a dashboard tracking volume, acceptance rates, and total spend, with the option to set monthly caps.

Where It Fits Alongside Claude Code Security

A few weeks before Code Review launched, Anthropic released Claude Code Security, which runs a continuous security sweep across an entire codebase. The two tools do different things. Code Review is scoped to logic errors in individual pull requests. Claude Code Security is a dedicated, ongoing security scan at the project level. If Code Review spots something security-related it will flag it, but it is not trying to replace the deeper scan.

Running It Locally

Right now Code Review only triggers when a PR is opened. There's significant demand to run it locally, inside a developer's workflow before anything gets pushed. Wu described that demand as a strong signal of product fit: engineers are actively seeking the tool out, not just tolerating it as automated overhead. Whether Anthropic ships a local version soon seems more like a when than an if.

The dynamic worth watching is this: AI tools flooded the PR queue, and now AI tools are being deployed to manage it. If the quality holds, this starts to look less like a workaround and more like a structural shift in how review gets done.