Human Review is the Bottleneck

  :#:
  : :
  : :             LGTM
.'   '.            /
:_____:        .___.
|     |        |   |          (my attempt at cheese)            
|     |        '. .'                 /
|     |          |      .--''''''--.
|_____|          |     |'''''/      |
:_____:         -'-    '''''/_...--'|
                           |__...--'

h/t John Frick

I liked reviewing code that humans wrote. You saw how people approached problems. It levelled you up.

Agent-generated code just doesn't feel the same. Not from a utilitarian standpoint, but more from a bandwidth/cognitive one.

A human coder spent their life units learning and making something with aspirations of grandeur, and importantly, with false starts, red herrings, rabbitholes, and dead ends.

You owe that effort your attention.

An agent's code hits you like a freight train at 100 tok/sec with zero mortal constraints. The output: disposable in a way human work never was (cue Altman's fast fashion era of software).

And that's a problem. If we can't bring the same rigor to AI code, and that is most of what's being produced, then we become the bottleneck. So you either end up rubber-stamping or drowning.

I argue, then the middle path is: 1) frontloading most of the cognitive work into planning, and 2) using an adversarial spec-to-code reviewer to know what the agent actually cooked, without subjecting yourself to every single line of synthetic code.

Let the agent grill you

The default plan mode in agent harnesses is often shallow; it'll get the shape right but miss the details. An /interview skill forces the agent to flip roles: instead of proposing, it asks. Socratic-style, non-obvious questions about tradeoffs, edge cases, and assumptions you haven't articulated or even considered yet.

Here's one I routinely use:

---
name: interview
argument-hint: [instructions]
description: Interview user to produce a detailed implementation spec
allowed-tools: AskUserQuestion, Write
---

Interview me using AskUserQuestionTool to build a complete implementation spec. <instructions>$ARGUMENTS</instructions>

## Interview approach

- Start with the **goal and core constraint** — what must be true for this to succeed?
- Probe layers: user-facing behavior → data model → edge cases → failure modes → tradeoffs accepted
- Ask questions that **expose hidden assumptions** — don't ask what I've already stated or what's obvious from context
- When I give a vague answer, push for concrete examples or acceptance criteria
- Surface contradictions between my stated goals and implied constraints
- Ask about what I'm explicitly **not** building (scope boundaries)
- One focused question at a time

## Completion criteria

Stop interviewing when you can answer these without guessing:
1. What does the happy path look like end-to-end?
2. What are the key failure/edge cases and how should they behave?
3. What are the hard constraints vs. preferences?
4. What's out of scope?

## Output

Write the spec to `.claude/specs/<slug>.md` where slug is derived from the project/feature name.

If any decisions were made during the interview that involve tradeoffs or ambiguity, write those separately to `.claude/specs/<slug>-decisions.md` with rationale.

Bear with the process. It feels slow but that's deliberate. What you get at the end is a crisp, laser-focused spec with zero bloat. Also saves you a ton of "wait that's not what I meant" later.

The difference can be "add middleware for auth" vs a spec that pins which routes are protected, whether tokens are validated against an issuer, what happens on expiry or missing token - you get the idea.

In spec we trust

Every class of knowledge work that's automated by agents will need verifiers. They can be rule-based symbolic systems, or LLM-based evals, or likely both.

For coding, verification simply means - did the agent build what the spec says?

This shifts it from code review to ensuring the machine fulfilled the contract. I use /review for this:

---
name: review
argument-hint: [spec-file-path]
description: Verify agent output against a spec
allowed-tools: Read, Bash, Write
---

You are a spec verifier. You have NO context about why decisions were made — only the spec and the code. If a decision isn't justified by the spec or decisions log, flag it.

## Inputs — read fully before reviewing

1. The spec: `$ARGUMENTS`
2. The git diff: `git diff main`
3. Decisions log (if exists): `.claude/specs/<slug>-decisions.md`

## Verify

For each acceptance criterion in the spec, evaluate and record in a table:

| Criterion | Verdict | Evidence | Test |
|-----------|---------|----------|------|
| Quote verbatim | PASS / FAIL / UNTESTED | File path + line, or what's missing | Test name that covers this, or NONE |

Then flag in a separate table:

| Issue | Type | Location |
|-------|------|----------|
| Description | silent decision / impl-detail test / scope drift | File path + line |

## Output

Write to `.claude/reviews/<slug>-r<N>.md` (next available N).

Print a summary line: `X/Y PASS | Z FAIL | W UNTESTED | T criteria without tests`

If any FAIL or UNTESTED, end with a numbered list of standalone fix instructions — one per row, no context assumed, ready to be handed directly to a coding agent.

End with: **ACCEPT** | **REJECT** (with fix list) | **UNCLEAR** (needs author input)

Do NOT fix code. Do NOT suggest refactors.

So the loop looks like:

/interview grills you to produce a detailed spec
Agents write code + tests based on the spec
/review verifies the agent's work against the spec, and
You feed back the /review output to (2) to fix the issues and repeat

Specs can be buggy too but ones you can reason about and catch during /interview instead of bugs that are buried under 6ft of code.

Code is merely a translated, high-density representation of the spec. By the time the agent writes code, you should already know what it's going to produce. Review then simply becomes confirmation instead of discovery. And well, you get fewer wasted brain cycles.

What's undeniable though is that codeslop is here - and this is how we fight its entropy.