Autonomous Task Guardrails

An agent without guardrails is a liability. As AI systems take more autonomous actions — sending emails, writing files, calling APIs, spending budget — the PM needs to define not just what the agent does, but the hard limits on what it can do and the mechanisms that stop it before it causes irreversible harm. This skill defines the full guardrail stack: scope limits, action limits, resource limits, circuit breakers, and the human override that must always exist.

---

Context

Why guardrails are a PM responsibility, not just an engineering one:

Guardrails are product decisions with product consequences. A guardrail set too tight means the agent asks for permission constantly and users turn it off. A guardrail set too loose means the agent takes actions the user didn't sanction and trust collapses. The PM must own this balance.

The four guardrail layers:

Layer	What it limits	Examples
Scope guardrails	What the agent is allowed to work on	Only files in folder X; only emails to approved recipients
Action guardrails	What types of actions the agent can take	Read but not delete; create drafts but not send
Resource guardrails	How much the agent can consume	Max N API calls; max $X spend; max N minutes
Circuit breakers	Automatic stops triggered by anomalous behaviour	Stops if error rate > X%; stops after N retries

---

Step 1 — Define the autonomous task context

Ask:

What task is the agent executing autonomously?

What tools does the agent have?

What is the expected task duration?

What is the expected cost per run?

What is the highest-risk action the agent could take?

What data does the agent have access to?

Step 2 — Define scope guardrails

Data scope, user scope, and task scope boundaries.

Step 3 — Define action guardrails

Permitted actions (no confirmation), restricted actions (pause and confirm), and forbidden actions (never allowed).

Step 4 — Define resource guardrails

Step limits, time limits, cost limits, API call limits, and storage limits.

Step 5 — Define circuit breakers

Error rate, repetition detection, confidence drop, scope drift, and user inactivity triggers. Plus a global emergency stop.

Step 6 — Define the audit trail

Every autonomous action logged with timestamp, tool, input, output, status, reasoning, and user confirmation status.

Quality check before delivering

All four guardrail layers are covered

Every forbidden action has a specific definition

Resource limits have numeric values

Circuit breakers define the exact trigger condition and reset mechanism

Global emergency stop is specified with a specific UI requirement

Audit trail includes undo capability for reversible actions

Suggested next step: Test the circuit breakers before launch. Deliberately trigger each one in staging. The circuit breakers that are never tested are the ones that fail when you need them.