About

Site Reliability Tech Lead by day. At home, I run a small fleet of services, sensors, and agents that try to make family life less manual. This site is the working notebook.

Dan Robinson
Site Reliability Tech Lead
Stafford · UK

I lead Site Reliability work at a large UK tech company. The job is what you'd expect: SLOs, on-call, platforms that have to behave at scale, and the long unglamorous tail of toil that you only beat with discipline and good tooling.

At home I do a quieter version of the same thing. There's a homelab in the corner, a Home Assistant install that runs lights, climate, energy, presence and a stack of small automations. I run a small fleet of agents on OpenClaw — an agent runtime I adopted, not wrote (the upstream is at openclaw.ai) — covering a family assistant, a coach for my son who has Type 1 diabetes, agents for my 3D printing hobby, and a handful of dev-facing helpers I lean on every day.

None of this is a portfolio. I'm not selling anything. The reason I write it down is selfish: writing forces me to admit what I actually understand, what I'm guessing at, and what I should walk away from. If any of it is useful to you, brilliant.

A few principles, since they shape almost everything here

Vertical slices over phases. A telemetry pipeline isn't done until I'm using the dashboard.
Kill it if the cost shows up. I've rolled back things I was halfway proud of. Sunk cost is the enemy.
Fewer, fatter tickets. A long tail of small follow-ups is a smell, not a workflow.
Specifics beat abstractions. Numbers, screenshots, configs (sanitised), real failure modes.

If you want the LinkedIn version, it's here. If you want the work, that's what the rest of the site is for.