Running an agent team in production
The boring infrastructure problems you only meet after the demo is over.
We get asked about the model choice. We have been asked exactly once about the things that actually keep agents online.
The shape of the problem
A production agent is not a chat.completions call. It is a long-running loop with:
- Tool calls that mutate external systems (EMR write, calendar invite, SMS send).
- Voice I/O with sub-second latency targets and back-pressure when the network jitters.
- Failure modes that are partial — the model answered correctly, the SMS gateway didn't.
- Auditability — every decision needs to be replayable for a compliance review three months later.
You do not solve any of those with a better prompt.
What the stack actually looks like
The agent loop runs as a Python worker on LiveKit Cloud — same realtime mesh the voice session uses, so there is one network hop, not two. Each turn writes a step row to a structured log: model input, tool calls, tool returns, decision, latency. That log is the debugger, the audit trail, and the dataset for the next iteration of the system prompt.
The site you're reading this on runs in Next 16 (App Router), R3F for the chip animation, GSAP + Lenis for the scroll narrative, and Tailwind 4 with a custom chrome primitives layer. None of which matters to the agent. They matter because they are the surface that brings the agent into the room.
The expensive habits we have committed to
- No silent retries. If a tool call fails twice, the agent says so out loud and asks the caller for a different path.
- No prompt rewrites in production. Prompts live in the repo. PRs only.
- Hand off, don't escalate. If the agent isn't confident in the next turn, the human takes over on the same line — no transfer, no new ticket.
The model is a commodity. The discipline is not.
