Delegating a task to an AI agent is easy. Delegating accountability is not. When an agent can call APIs, write files, or send messages, it becomes a user with elevated privileges and no memory of your risk model. I design agentic systems with guardrails baked in, not bolted on.
Scope the permissions, not just the prompt
A prompt that says "do not delete anything" is not a security control. I give each agent a minimal capability set: a whitelist of tools, read-only access by default, and explicit approval gates for destructive or irreversible actions. If the agent does not need write access to production, it does not get it.
Log the reasoning, not just the action
Every decision gets logged: the prompt, the retrieved context, the plan, the tool calls, and the final output. This makes audits possible and debugging sane. When something goes wrong, you can replay the chain instead of guessing which prompt variant was active.
Human-in-the-loop for high-stakes operations
I never let an agent deploy to production, send customer-facing communication, or modify billing without human approval. The loop is short and context-rich: the agent proposes, a human confirms, the agent executes. This keeps the speed while preserving accountability.
Kill switches and circuit breakers
Agents should fail safely. I add rate limits, budget caps, timeout boundaries, and circuit breakers. If an agent starts looping, exceeds a cost threshold, or attempts a disallowed action, it stops and escalates. The default state after a failure is halted, not "try harder."
Agentic software is powerful, but power without boundaries becomes a liability. If you are building agents that touch real systems, let me design the guardrails.