On the Ralph Wiggum phenomenon - it works in the same way that a game of darts does.
Like many folks over the holidays, I tried pushing coding agents to their limits, thanks to @anthropic for the 2x usage limits.
Here are some observations from the trenches:
Monolithic But Not Completely
Arguably the biggest failure mode for coding agents is scattered context across many files and directories. To fix this:
- Maintain a single human/agent readable
AGENTS.mdat the root as LLMs are increasingly able to handle long-context better. This contains your coding philosophy, project details, design rules, tech stack, and architectural decisions. Codex, OpenCode or Cursor will pick it up by default - For Claude Code however, which doesn’t reference
AGENTS.mdautomatically, keep this reference line insideCLAUDE.mdto bridge both:@AGENTS.md - Run
/initon CI whenever you merge tomainto avoid staleAGENTS.md. Here is a gist to setup the workflow via github actions - Once you’ve seen a good software pattern in the wild, add it to
AGENTS.md- this file should serve more as an evolving, living system doc than just listing out what the codebase is about - Personal trio: Opus 4.5 for planning, Haiku 4.5 subagents / parallel exploration, and Sonnet 4.5 for execution
- Use
/install-github-appfor Claude to automatically generate code reviews on your PRs - Add
SKILL.mdfor repetitive expertise: testing patterns, migration workflows, code review checklists etc. Keep each < 500 lines and bundle scripts that execute rather than load into context - Add MCP servers for API docs (the only MCP use-case that's stuck with me so far)
- Use
@.cursor/rulesbut only if you must
Here's a sample repo structure:
repo/
|---.cursor/
| ├── rules/
| └── ...
├── .claude/
│ ├── hooks.json # Pre/post tool automation
│ └── skills/ # Domain-specific expertise
│ ├── testing/SKILL.md
│ └── deploying/SKILL.md
├── AGENTS.md # Single agent instruction file (<3k tokens)
├── CLAUDE.md # One line: @AGENTS.md
├── TODO.md # Ephemeral scratchpad for agents
├── docs/ # Deep research, specs, PRDs
├── src/ # Max 3 levels deep
└── ...
One TODO to Rule Them All
Keep a single TODO.md at the root of the repo for all tasks that agents have done/are doing:
- Treat it not just as a list, but as a blocking state machine for the agent i.e. it MUST NOT touch code until it has updated
TODO.mdfirst 1 - If tasks have zero overlap, spin up multiple subagents in parallel
- If agents disconnect or get interrupted, you can always resume using
TODO.mdas the reference - Always make it log the “why” behind each decision. Engineering is all about tradeoffs and documenting why A not B will provide better in-context learning and prevent retrying failed approaches. And you get "memory" for free
- Note: Ultimately treat git commits as the canonical state -
TODO.mdis meant to be verbose than commit messages and represent the current state as much as what's already been done
Add this near the top of your AGENTS.md:
## Task Management
The `TODO.md` file at the root is the master task list for this project. Create one if it doesn't exist.
### Structure
```
# TODO
## To Do
- [ ] `feat` Task description (why: reasoning behind approach) @assignee
- Sub-task implementation details
- [ ] `bug` Bug description with reproduction steps (why: root cause analysis)
## In Progress
- (!) [>] `feat` High priority task currently being worked on (why: chose X over Y due to Z tradeoffs)
## Completed
### YYYY-MM-DD
- [x] `feat` New feature or functionality added (why: maintains consistency with existing patterns)
- [x] `bug` Issue or defect fixed (why: approach B worked better than A for edge cases)
- [x] `hotfix` Urgent production fix applied
- [x] `refactor` Code restructuring without behavior change (why: reduces duplication, improves testability)
- [x] `chore` Maintenance, tooling, dependencies, config
- [x] `docs` Documentation updates only
- [x] `perf` Performance optimization (why: O(n²) → O(n) to fix timeouts on large datasets)
```
### States
| Syntax | Meaning | Usage |
|--------|---------|-------|
| `[ ]` | Todo | Not started |
| `[>]` | In Progress | Actively working |
| `[x]` | Done | Completed, move to Completed |
| `(!)` | High priority | Prefix before state marker |
### Task Tags
| Tag | Description | Example |
|-----|-------------|---------|
| `feat` | New feature or functionality | Add OAuth login |
| `bug` | Bug fix | Fix null pointer in validator |
| `hotfix` | Urgent production fix | Patch security vulnerability |
| `refactor` | Code restructuring | Extract helper functions |
| `chore` | Maintenance, tooling, deps | Update dependencies |
| `docs` | Documentation only | Update API docs |
| `perf` | Performance optimization | Reduce DB query time |
### Workflow
**MANDATORY**: You MUST NOT use Write/Edit/NotebookEdit tools until TODO.md is updated. This is a BLOCKING requirement with NO exceptions for code changes.
**Required sequence for ALL code changes:**
1. Read TODO.md to check current state
2. Add task with appropriate tag to "To Do" section
3. Move task to "In Progress" section and mark `[>]` when you start writing code
4. Write/Edit code files
5. Mark task `[x]` done when changes complete
6. Move completed task to "Completed" section under date heading
**Rules:**
- Check TODO.md "To Do" and "In Progress" sections before starting work
- Log ALL code changes as tasks with appropriate tags (even small one-line fixes)
- Keep exactly ONE task as `[>]` in "In Progress" section at a time
- Mark `[x]` done immediately when complete (no batching)
- Move completed tasks to "Completed" section under date heading
- Add sub-tasks indented 2-4 spaces, assignees with `@username`
- Keep entries terse and actionable
- Tags are required for all tasks (feat, bug, hotfix, refactor, chore, docs, perf)
**Exception**: Pure documentation (AGENTS.md, README.md, docs/*.md) or TODO.md edits themselves don't need logging.
Managing Context Overflow
- Agents are not amazing with deeply nested file trees. Prefer flat structures, max 3 levels of depth 2
- Manually garbage collect by using compaction (on
TODO.mdfor example) at the end of every sprint to prevent hitting context window limits - Restrain agents from generating 10+ files at once. Scope tasks properly, as the more code they generate, larger the review surface. Human review is the bottleneck
Agents need to increasingly traverse git history for longitudinal understanding of code, how things evolved, not just see the snapshot of where it is now. For e.g. enforcing git log --oneline -20 to understand recent temporal context or git show <commit-hash> to see full diffs for specific commits. They should treat your undos, rewinds and checkpoints as negative reinforcement signals (learning what not to do) on the fly. For now, we log everything! Agent harnesses are becoming more robust on long-horizon tasks with better error recovery so refrain from saturating AGENT.md with too many guardrails.
Now go build something great.