ESSAY · 3 min · 2026.04.05

What code review actually catches

Hand-drawn iceberg in blue ink. The small tip above the waterline is labeled 'WHAT WAS ACTUALLY REVIEWED' and '"LGTM!"'. The huge mass below the waterline is labeled 'WHAT ACTUALLY CAUSES INCIDENTS'.

90% of code review comments are about naming conventions and bracket placement. Meanwhile, the bugs that actually hit production sail through with a thumbs-up emoji.

I’ve had a version of this conversation with a lot of engineering leaders. They describe the same setup: thorough reviews, clear standards, required approvals. And yet the bugs that cause real incidents are almost never the ones traditional code review was going to catch. It isn’t a culture problem or a skill problem — code review, as practiced, just isn’t optimized for those kinds of problems.

One leader put it bluntly:

“A lot of the time when you’re doing peer reviews, you’re like half checked out. You’re barely paying attention. Stuff slips through the net.”

Reviews happen between meetings and tasks, while you’re juggling context from five different parts of the codebase. So what gets attention is whatever you can evaluate quickly without rebuilding the entire mental model of the change: naming, formatting, small correctness issues you can spot at a glance.

Where the real bugs live

The bugs that matter live elsewhere. They show up in edge cases that span multiple systems, subtle state transitions, assumptions that only break under real traffic, and interactions with code far outside the diff. Catching those requires deep, uninterrupted context, and that’s exactly what human reviewers rarely have.

So developers spend their time on things machines are already good at, and the things that actually matter don’t get the attention they need.

A better split

The split that works is fairly obvious in retrospect.

AI handles:

Correctness checks
Conventions and consistency
Cross-file and cross-system analysis

Humans handle:

Architecture
Intent
Whether the change should exist at all

Code review doesn’t go away; it aligns with what experienced developers are actually good at.

Encode the standards instead

Even that split is a bit behind the curve. The most optimized teams aren’t reviewing for conventions or correctness at all. They’re defining them upfront, in writing, before any PR opens — what “good” looks like, and what should never make it to production. Enforcement gets delegated to machines, and code review turns into a place to evaluate intent rather than debate standards.

Instead of hoping reviewers catch issues, you write your expectations directly into the system. Every migration must be reversible. Client-facing features ship behind a flag. New endpoints get rate limiting by default. Those checks run automatically on every PR, with no reviewer left wondering whose turn it is.

If you want to see what that looks like:

Docs: docs.macroscope.com/check-run-agents
Example: youtu.be/9TsxHfKjRqg

The role of code review doesn’t disappear. It gets narrower, and more important. Humans stop acting as linters and start acting as architects.

What code review actually catches

Same setup, same blind spots

Where the real bugs live

A better split

Encode the standards instead