AI Agent Design

An AI agent is not a chatbot with more features. An agent perceives its environment, plans a sequence of actions, executes them using tools, and loops until the goal is achieved — all without a human confirming each step. This changes the PM's job: instead of defining what the AI says, you're defining what the AI does.

Context

The agent loop:

User goal → Perceive → Plan → Act → Observe → Loop → Stop

The PM's design surface:

Design decision	What you're specifying
Goal definition	What counts as "done" — precisely
Tool set	What actions the agent is allowed to take
Autonomy level	When the agent acts alone vs. asks the human
Memory	What the agent remembers within and across sessions
Failure handling	What the agent does when it gets stuck or makes an error
Guardrails	What the agent must never do, regardless of instructions

Step 1 — Define the agent goal

The goal definition is the most important design decision. Vague goals produce unpredictable agent behaviour.

GOAL SPECIFICATION TEMPLATE:

Agent name: [Name]
User intent: [What the user wants to accomplish]

Goal definition:
DONE when: [Specific, observable end state]
PARTIAL when: [The goal is partially achieved — what the agent should do next]
FAILED when: [The agent cannot make progress — what should happen]

Step 2 — Define the tool set

Every tool the agent has is an action it can take autonomously.

TOOL REGISTRY: [Agent name]

Tool: [Tool name]
Action: [What it does]
Use when: [When the agent should use it]
Inputs: [Required inputs]
Outputs: [What it produces]
Failure modes: [What can go wrong]
Confirmation required: [Yes / No / Only for sensitive actions]

Tool Risk Classification:

Reversible, low-risk: read-only actions → Agent can use without confirmation

Reversible, medium-risk: write actions that can be undone → Confirm intent once

Irreversible, high-risk: actions that cannot be undone → Always pause and show user

Step 3 — Define the autonomy level

AUTONOMY DESIGN: FULLY AUTONOMOUS: [List actions — read-only, reversible, low-stakes] PAUSE-AND-CONFIRM: [List actions — irreversible, external communication]

STOP-AND-ESCALATE: [List situations — ambiguous goal, conflicting info, high-risk]

Step 4 — Define the memory model

MEMORY MODEL: IN-SESSION MEMORY: Goal, progress, actions taken, errors encountered CROSS-SESSION MEMORY: User preferences, prior task history, learned patterns

MEMORY RISKS: Stale memory, over-personalisation, privacy

Step 5 — Design the failure handling

FAILURE HANDLING: TYPE 1 — Tool failure: Retry up to N times, then surface to user TYPE 2 — Goal ambiguity: Pause and ask a specific clarifying question TYPE 3 — Loop detection: Stop after N repeated actions, report progress TYPE 4 — Scope violation: Hard stop, explain, ask for confirmation

MAX STEPS LIMIT: [N] steps before pausing and reporting

Step 6 — Define the guardrails

AGENT GUARDRAILS:

The agent will NEVER:
[ ] Take an irreversible action without showing the user what it's about to do
[ ] Send communications without explicit per-send confirmation
[ ] Access files outside the defined workspace
[ ] Spend money without explicit authorisation per transaction
[ ] Retain personal data beyond the defined session without consent
[ ] Escalate its own permissions
[ ] Override a user's explicit "stop" or "cancel" instruction

Quality check before delivering

Goal is defined with DONE / PARTIAL / FAILED states

Every tool has a risk classification and confirmation rule

Irreversible actions all require explicit user confirmation

Max steps limit is defined

Guardrails are absolute — not "the agent tries to avoid"

Failure handling covers all four failure types

Suggested next step: Walk through the agent loop manually with five real user scenarios — including one adversarial one. The gaps in your failure handling and guardrails will show up immediately.