Autonomous Task Guardrails
An agent without guardrails is a liability. As AI systems take more autonomous actions — sending emails, writing files, calling APIs, spending budget — the PM needs to define not just what the agent does, but the hard limits on what it can do and the mechanisms that stop it before it causes irreversible harm. This skill defines the full guardrail stack: scope limits, action limits, resource limits, circuit breakers, and the human override that must always exist.
---
Context
Why guardrails are a PM responsibility, not just an engineering one:Guardrails are product decisions with product consequences. A guardrail set too tight means the agent asks for permission constantly and users turn it off. A guardrail set too loose means the agent takes actions the user didn't sanction and trust collapses. The PM must own this balance.
The four guardrail layers:| Layer | What it limits | Examples |
|---|---|---|
| Scope guardrails | What the agent is allowed to work on | Only files in folder X; only emails to approved recipients |
| Action guardrails | What types of actions the agent can take | Read but not delete; create drafts but not send |
| Resource guardrails | How much the agent can consume | Max N API calls; max $X spend; max N minutes |
| Circuit breakers | Automatic stops triggered by anomalous behaviour | Stops if error rate > X%; stops after N retries |
---
Step 1 — Define the autonomous task context
Ask:
Step 2 — Define scope guardrails
Data scope, user scope, and task scope boundaries.
Step 3 — Define action guardrails
Permitted actions (no confirmation), restricted actions (pause and confirm), and forbidden actions (never allowed).
Step 4 — Define resource guardrails
Step limits, time limits, cost limits, API call limits, and storage limits.
Step 5 — Define circuit breakers
Error rate, repetition detection, confidence drop, scope drift, and user inactivity triggers. Plus a global emergency stop.
Step 6 — Define the audit trail
Every autonomous action logged with timestamp, tool, input, output, status, reasoning, and user confirmation status.