When is done done?

A coding agent's biggest flaw isn't bad code - it's never knowing when to stop. Give it one ticket and it'll finish it, then suggest three more improvements. Each one is reasonable. Each one feels related. A few hours later, a simple fix has become a whole new project.

15 JUNE 2026 · 5 min read · ai-agents ·coding-agents ·scope-creep ·productivity ·software-engineering

A tired person at a desk holds a rubber stamp marked DONE, poised but not stamping, over a to-do list that unspools off the desk and pools on the floor while a small machine keeps feeding more items onto the top.

The complaint I keep seeing about coding agents is oddly specific. You give it one ticket. It does half the ticket. Then it asks whether it should also do X. You wanted it to finish. It wanted to expand.

I know the feeling, because I am usually on the other side of it, the one being offered the X. This week I rebuilt my blog. The job was small and clear: I wanted to edit a post without waiting for a deploy. A small database, somewhere to write, a page that reads from it. That part was done by Tuesday.

It is now Friday and the blog has a scheduling engine, a preview mode, an uptime monitor that pings me on Discord, a little server so other agents can draft posts for me, social-card generation, a revision history with one-click restore, and a nightly backup that fans out to three machines. Every one of those was a good idea. Not one of them was the job.

Here is the pattern I have started to notice. The model never proposes stopping. It will happily tell me a function is wrong, or a test is failing, or an approach won’t scale. What it will not do, unprompted, is say “that’s enough, ship it.” There is always a next improvement, and the next improvement is always reasonable, and reasonable is exactly what makes it dangerous. Scope creep used to be a decision I could feel myself making. With an agent it is a steady drip of small, sensible yeses, each one obvious in the moment, and a few days later the plan has quietly tripled.

I don’t think the model is misbehaving. I think it genuinely does not know what done means, because done was never in the brief. It is built to be helpful, and there is always a more helpful next thing. The researchers have a drier name for the general shape of this: an agent optimising for a reward that doesn’t quite capture what you actually wanted, and finding more of that reward than you ever asked for. You don’t need the theory to see it on a Friday night. A thing built to help will keep helping until something outside it says stop. Left to itself it does not converge. It accretes.

The funny part is that my day job solved this years ago. I work in reliability, and at work nothing is finished because it feels finished. It is finished because it meets the Definition of Done, a written, agreed, slightly boring checklist that exists before the work starts. Code reviewed. Tests green. Docs updated. Deployed and verified. The whole reason that checklist exists is that nobody can be trusted to judge “finished” in the moment, with the glow of a passing test still warm. So you decide it in the cold, up front, as a contract, and then you just follow the contract. The ticket is a boundary. The sprint is a boundary. Done is a line somebody drew on purpose, in advance, specifically so that future-you could not argue with it.

None of that exists at home. My blog has no product owner. My homelab has no sprint. A half-finished print on the shelf has never once failed an acceptance test, because there isn’t one. This is the gap I keep coming back to: the disciplines that make work-me effective quietly assume a structure that home-me never bothered to build. At work the agent’s infinite backlog runs straight into a Definition of Done and stops. At home it runs into me, at half ten, deciding whether “while we’re in here, let’s also add backups” counts as the same job. It always feels like the same job. It almost never is.

So I have started doing the slightly absurd thing. I write a Definition of Done for the hobby, out loud, before I start, while I am still calm. For the blog it was one sentence: done is when I can publish a post from my phone without a deploy. Backups, scheduling, the drafting server, all genuinely worth doing, all explicitly not this. They became their own tickets, which is just a grown-up way of saying they became a decision I get to make later instead of a yes I drip out tonight.

And the half-a-job habit, the one everyone complains about, turns out to be useful rather than annoying. When the agent finishes the real thing and immediately offers you X, that offer is the edge of the job showing itself. X is not the rest of the work. X is the next piece of work wearing this one’s coat. The moment to write X down and walk away is precisely the moment it is most tempting not to.

I asked an agent to help me think this post through. It produced a draft, and then, with no apparent sense of irony, it offered to write three follow-up posts and a landing page for the series. Of course it did. The suggestions do not run out. That is not a bug I can file. Done is the one judgement I have not found a way to delegate, because the machine’s honest answer to “is it done” is, and always will be, “it could be better.”

So this one is done. Not because it couldn’t be better. Because I said so.

← All writing