Containment before cleverness

The useful part of running agents at home is not the intelligence. It is the containment: narrow jobs, short tool lists, visible audit trails, rate limits, and a real kill switch when something starts behaving oddly.

· 6 min read

A hand-drawn schematic of a small group of household agents inside separate boxed lanes, each with a visible stop switch, rate limiter, and audit log.

Containment before cleverness

Every home agent I keep gets three things before I trust it: a narrow job, a rate limit, and a kill switch.

That sounds obvious.

Most agent chatter still starts with what the model can do. I care more about what happens at 16:42 on a Thursday when the input is messy, the prompt is slightly wrong, and I cannot be bothered to babysit it. The useful question is not whether the model can squeeze out one more clever trick. It is whether the thing fails in a boring way.

I am running a small OpenClaw fleet at home now. OpenClaw is the runtime I run that fleet on, not something I wrote. The interesting work, for me, has stopped being about getting one dramatic demo to land. It is about deciding which agents are worth keeping around after the novelty wears off.

That changes the build rules.

One job per agent

The first control is scope.

An agent with one narrow job is usually more useful than an agent with five vague ones. The narrow one is easier to prompt, easier to test, easier to notice when it drifts, and much easier to switch off without collateral damage.

If an agent helps with household admin, gym logging, or camera-club bits, I do not need it to feel magical. I need it to do one bounded job without turning every adjacent system into part of the blast radius.

This is the bit that most resembles normal reliability work. The problem is not capability in the abstract. The problem is ownership. When an agent can read too much, write too much, or improvise too widely, you stop having a tool and start having a source of low-grade operational drag.

One job per agent sounds like a design preference. It is really an incident-prevention habit.

Short tool lists are a kindness

The second control is friction in the right place.

If a model has ten ways to act, it will occasionally choose the wrong one with confidence. If it has three, the odds improve. More importantly, the failure becomes legible. I can usually work out why it did the wrong thing, because the available moves are obvious.

That matters more than the prompt.

A lot of agent discussion still assumes the prompt is the main control surface. It is part of it, but not the part I trust most. I would rather give an ordinary prompt to an agent with a short tool list than a brilliant prompt to an agent with broad, fuzzy permissions.

The prompt tells the model what I want. The tool boundary tells it what damage is possible.

Those are not equivalent. When things go wrong, the second one wins.

Rate limits beat optimism

The third control is tempo.

An error that fires once is a mistake. An error that loops is a chore. Home systems have less slack than work systems because there is no team absorbing the noise. If an agent starts doing the wrong thing repeatedly, the clean-up lands on exactly one person.

So I rate-limit anything that can write, send, or fan out. Not because the agents are malicious, and not because the models are bad, but because small errors compound faster than they look.

A slow wrong system is survivable. A fast wrong system is a weekend.

This is one of those boring rules that feels excessive until the first time it saves you. Then it becomes non-negotiable.

A visible audit trail changes the relationship

The fourth control, if I had to pick one that matters most in practice, is a visible audit trail.

I have switched off a few agents that were perfectly clever and deeply annoying to live with. The pattern was always the same: they could do the task, but they made it too hard to tell what they had done, why they had done it, or what needed undoing.

The agents that stay live are the ones with obvious failure modes and obvious receipts.

That can be as simple as a log line, a row in a table, or a message that links directly to the thing it claims to have changed. The point is not enterprise observability theatre. The point is being able to glance at the artefact and decide whether the agent is helping or freelancing.

At work, a lot of this is distributed across dashboards, traces, review tools, and other people’s attention. At home, the operator and the user are the same person. If the audit trail is hidden, I will not check it. If I do not check it, the system will eventually teach itself bad manners.

So the good home-agent pattern is not “make it autonomous”. It is “make it inspectable”.

A hand-drawn sketch of one agent trying to spill into neighbouring tasks but being stopped by clear boundaries and switches.

The kill switch is part of the product

The final control is the least glamorous and the most reassuring: a way to stop writes when something starts smelling wrong.

Not a future plan for how to disable it. Not a note to myself about where the config probably lives. An actual switch.

I have much more patience for an experimental agent if I know I can freeze it immediately. The presence of a kill switch changes how willing I am to keep iterating, because it turns weird behaviour from a looming clean-up exercise into a small decision.

That sounds operational because it is.

Reliability work changes how you build these things. You stop asking “can this agent do more?” and start asking “what is the smallest blast radius that still makes this worth keeping?”

Worked better.

What stays running

I do not keep agents because they are clever. I keep them because the controls are boring and the outcomes are legible.

One job per agent.

A short tool list.

Approvals for anything public.

A visible audit trail.

A flag that can pause writes when something starts smelling wrong.

That is the useful bit of home agents for me. Not the intelligence. The containment.

If you have kept agents running for more than a week, you probably already know which one saved you first. It was not the fancy prompt. It was the boundary.

← All writing