10x Coding Agents

On the Ralph Wiggum phenomenon - it works in the same way that a game of darts does.

Like many folks over the holidays, I tried pushing coding agents to their limits, thanks to @anthropic for the 2x usage limits.

Here are some observations from the trenches:

Monolithic But Not Completely

Arguably the biggest failure mode for coding agents is scattered context across many files and directories. To fix this:

Maintain a single human/agent readable AGENTS.md at the root as LLMs are increasingly able to handle long-context better. This contains your coding philosophy, project details, design rules, tech stack, and architectural decisions. Codex, OpenCode or Cursor will pick it up by default
For Claude Code however, which doesn’t reference AGENTS.md automatically, keep this reference line inside CLAUDE.md to bridge both: @AGENTS.md
Run /init on CI whenever you merge to main to avoid stale AGENTS.md. Here is a gist to setup the workflow via github actions
Once you’ve seen a good software pattern in the wild, add it to AGENTS.md - this file should serve more as an evolving, living system doc than just listing out what the codebase is about
Personal trio: Opus 4.5 for planning, Haiku 4.5 subagents / parallel exploration, and Sonnet 4.5 for execution
Use /install-github-app for Claude to automatically generate code reviews on your PRs
Add SKILL.md for repetitive expertise: testing patterns, migration workflows, code review checklists etc. Keep each < 500 lines and bundle scripts that execute rather than load into context
Add MCP servers for API docs (the only MCP use-case that's stuck with me so far)
Use @.cursor/rules but only if you must

Here's a sample repo structure:

markdown

  repo/
  |---.cursor/
  |   ├── rules/
  |      └── ...
  ├── .claude/
  │   ├── hooks.json     # Pre/post tool automation
  │   └── skills/        # Domain-specific expertise
  │       ├── testing/SKILL.md
  │       └── deploying/SKILL.md
  ├── AGENTS.md          # Single agent instruction file (<3k tokens)
  ├── CLAUDE.md          # One line: @AGENTS.md
  ├── TODO.md            # Ephemeral scratchpad for agents
  ├── docs/              # Deep research, specs, PRDs
  ├── src/               # Max 3 levels deep
  └── ...

One TODO to Rule Them All

Keep a single TODO.md at the root of the repo for all tasks that agents have done/are doing:

Treat it not just as a list, but as a blocking state machine for the agent i.e. it MUST NOT touch code until it has updated TODO.md first ¹
If tasks have zero overlap, spin up multiple subagents in parallel
If agents disconnect or get interrupted, you can always resume using TODO.md as the reference
Always make it log the “why” behind each decision. Engineering is all about tradeoffs and documenting why A not B will provide better in-context learning and prevent retrying failed approaches. And you get "memory" for free
Note: Ultimately treat git commits as the canonical state - TODO.md is meant to be verbose than commit messages and represent the current state as much as what's already been done

Add this near the top of your AGENTS.md:

markdown

## Task Management

The `TODO.md` file at the root is the master task list for this project. Create one if it doesn't exist.
### Structure

```
# TODO

## To Do

- [ ] `feat` Task description (why: reasoning behind approach) @assignee
  - Sub-task implementation details
- [ ] `bug` Bug description with reproduction steps (why: root cause analysis)

## In Progress

- (!) [>] `feat` High priority task currently being worked on (why: chose X over Y due to Z tradeoffs)

## Completed

### YYYY-MM-DD
- [x] `feat` New feature or functionality added (why: maintains consistency with existing patterns)
- [x] `bug` Issue or defect fixed (why: approach B worked better than A for edge cases)
- [x] `hotfix` Urgent production fix applied
- [x] `refactor` Code restructuring without behavior change (why: reduces duplication, improves testability)
- [x] `chore` Maintenance, tooling, dependencies, config
- [x] `docs` Documentation updates only
- [x] `perf` Performance optimization (why: O(n²) → O(n) to fix timeouts on large datasets)
```

### States

| Syntax | Meaning | Usage |
|--------|---------|-------|
| `[ ]` | Todo | Not started |
| `[>]` | In Progress | Actively working |
| `[x]` | Done | Completed, move to Completed |
| `(!)` | High priority | Prefix before state marker |

### Task Tags

| Tag | Description | Example |
|-----|-------------|---------|
| `feat` | New feature or functionality | Add OAuth login |
| `bug` | Bug fix | Fix null pointer in validator |
| `hotfix` | Urgent production fix | Patch security vulnerability |
| `refactor` | Code restructuring | Extract helper functions |
| `chore` | Maintenance, tooling, deps | Update dependencies |
| `docs` | Documentation only | Update API docs |
| `perf` | Performance optimization | Reduce DB query time |

### Workflow

**MANDATORY**: You MUST NOT use Write/Edit/NotebookEdit tools until TODO.md is updated. This is a BLOCKING requirement with NO exceptions for code changes.

**Required sequence for ALL code changes:**
1. Read TODO.md to check current state
2. Add task with appropriate tag to "To Do" section
3. Move task to "In Progress" section and mark `[>]` when you start writing code
4. Write/Edit code files
5. Mark task `[x]` done when changes complete
6. Move completed task to "Completed" section under date heading

**Rules:**
- Check TODO.md "To Do" and "In Progress" sections before starting work
- Log ALL code changes as tasks with appropriate tags (even small one-line fixes)
- Keep exactly ONE task as `[>]` in "In Progress" section at a time
- Mark `[x]` done immediately when complete (no batching)
- Move completed tasks to "Completed" section under date heading
- Add sub-tasks indented 2-4 spaces, assignees with `@username`
- Keep entries terse and actionable
- Tags are required for all tasks (feat, bug, hotfix, refactor, chore, docs, perf)

**Exception**: Pure documentation (AGENTS.md, README.md, docs/*.md) or TODO.md edits themselves don't need logging.

Managing Context Overflow

Agents are not amazing with deeply nested file trees. Prefer flat structures, max 3 levels of depth ²
Manually garbage collect by using compaction (on TODO.md for example) at the end of every sprint to prevent hitting context window limits
Restrain agents from generating 10+ files at once. Scope tasks properly, as the more code they generate, larger the review surface. Human review is the bottleneck

Agents need to increasingly traverse git history for longitudinal understanding of code, how things evolved, not just see the snapshot of where it is now. For e.g. enforcing git log --oneline -20 to understand recent temporal context or git show <commit-hash> to see full diffs for specific commits. They should treat your undos, rewinds and checkpoints as negative reinforcement signals (learning what not to do) on the fly. For now, we log everything! Agent harnesses are becoming more robust on long-horizon tasks with better error recovery so refrain from saturating AGENT.md with too many guardrails.

Now go build something great.

Footnotes

Linear is fine but I prefer code to be the single source of (versioned) truth. Caveat: this works well for solo/small teams but may not scale for larger teams ↩
I apply this principle generally including personal knowledge management e.g. Notion ↩