AI Agent Design
An AI agent is not a chatbot with more features. An agent perceives its environment, plans a sequence of actions, executes them using tools, and loops until the goal is achieved — all without a human confirming each step. This changes the PM's job: instead of defining what the AI says, you're defining what the AI does.
Context
The agent loop:
User goal → Perceive → Plan → Act → Observe → Loop → Stop
The PM's design surface:
| Design decision | What you're specifying |
|---|---|
| Goal definition | What counts as "done" — precisely |
| Tool set | What actions the agent is allowed to take |
| Autonomy level | When the agent acts alone vs. asks the human |
| Memory | What the agent remembers within and across sessions |
| Failure handling | What the agent does when it gets stuck or makes an error |
| Guardrails | What the agent must never do, regardless of instructions |
Step 1 — Define the agent goal
The goal definition is the most important design decision. Vague goals produce unpredictable agent behaviour.
GOAL SPECIFICATION TEMPLATE:
Agent name: [Name]
User intent: [What the user wants to accomplish]
Goal definition:
DONE when: [Specific, observable end state]
PARTIAL when: [The goal is partially achieved — what the agent should do next]
FAILED when: [The agent cannot make progress — what should happen]
Step 2 — Define the tool set
Every tool the agent has is an action it can take autonomously.
TOOL REGISTRY: [Agent name]
Tool: [Tool name]
Action: [What it does]
Use when: [When the agent should use it]
Inputs: [Required inputs]
Outputs: [What it produces]
Failure modes: [What can go wrong]
Confirmation required: [Yes / No / Only for sensitive actions]
Tool Risk Classification:
Step 3 — Define the autonomy level
AUTONOMY DESIGN:
FULLY AUTONOMOUS: [List actions — read-only, reversible, low-stakes]
PAUSE-AND-CONFIRM: [List actions — irreversible, external communication]
STOP-AND-ESCALATE: [List situations — ambiguous goal, conflicting info, high-risk]
Step 4 — Define the memory model
MEMORY MODEL:
IN-SESSION MEMORY: Goal, progress, actions taken, errors encountered
CROSS-SESSION MEMORY: User preferences, prior task history, learned patterns
MEMORY RISKS: Stale memory, over-personalisation, privacy
Step 5 — Design the failure handling
FAILURE HANDLING:
TYPE 1 — Tool failure: Retry up to N times, then surface to user
TYPE 2 — Goal ambiguity: Pause and ask a specific clarifying question
TYPE 3 — Loop detection: Stop after N repeated actions, report progress
TYPE 4 — Scope violation: Hard stop, explain, ask for confirmation
MAX STEPS LIMIT: [N] steps before pausing and reporting
Step 6 — Define the guardrails
AGENT GUARDRAILS:
The agent will NEVER:
[ ] Take an irreversible action without showing the user what it's about to do
[ ] Send communications without explicit per-send confirmation
[ ] Access files outside the defined workspace
[ ] Spend money without explicit authorisation per transaction
[ ] Retain personal data beyond the defined session without consent
[ ] Escalate its own permissions
[ ] Override a user's explicit "stop" or "cancel" instruction