Back to library

AI Agent Design

An AI agent is not a chatbot with more features. An agent perceives its environment, plans a sequence of actions, executes them using tools, and loops until the goal is achieved — all without a human confirming each step. This changes the PM's job: instead of defining what the AI says, you're defining what the AI does.

Context

The agent loop:

User goal → Perceive → Plan → Act → Observe → Loop → Stop

The PM's design surface:

Design decisionWhat you're specifying
Goal definitionWhat counts as "done" — precisely
Tool setWhat actions the agent is allowed to take
Autonomy levelWhen the agent acts alone vs. asks the human
MemoryWhat the agent remembers within and across sessions
Failure handlingWhat the agent does when it gets stuck or makes an error
GuardrailsWhat the agent must never do, regardless of instructions

Step 1 — Define the agent goal

The goal definition is the most important design decision. Vague goals produce unpredictable agent behaviour.

GOAL SPECIFICATION TEMPLATE:

Agent name: [Name]

User intent: [What the user wants to accomplish]

Goal definition:

DONE when: [Specific, observable end state]

PARTIAL when: [The goal is partially achieved — what the agent should do next]

FAILED when: [The agent cannot make progress — what should happen]

Step 2 — Define the tool set

Every tool the agent has is an action it can take autonomously.

TOOL REGISTRY: [Agent name]

Tool: [Tool name]

Action: [What it does]

Use when: [When the agent should use it]

Inputs: [Required inputs]

Outputs: [What it produces]

Failure modes: [What can go wrong]

Confirmation required: [Yes / No / Only for sensitive actions]

Tool Risk Classification:
  • Reversible, low-risk: read-only actions → Agent can use without confirmation
  • Reversible, medium-risk: write actions that can be undone → Confirm intent once
  • Irreversible, high-risk: actions that cannot be undone → Always pause and show user
  • Step 3 — Define the autonomy level

    AUTONOMY DESIGN:
    
    

    FULLY AUTONOMOUS: [List actions — read-only, reversible, low-stakes]

    PAUSE-AND-CONFIRM: [List actions — irreversible, external communication]

    STOP-AND-ESCALATE: [List situations — ambiguous goal, conflicting info, high-risk]

    Step 4 — Define the memory model

    MEMORY MODEL:
    
    

    IN-SESSION MEMORY: Goal, progress, actions taken, errors encountered

    CROSS-SESSION MEMORY: User preferences, prior task history, learned patterns

    MEMORY RISKS: Stale memory, over-personalisation, privacy

    Step 5 — Design the failure handling

    FAILURE HANDLING:
    
    

    TYPE 1 — Tool failure: Retry up to N times, then surface to user

    TYPE 2 — Goal ambiguity: Pause and ask a specific clarifying question

    TYPE 3 — Loop detection: Stop after N repeated actions, report progress

    TYPE 4 — Scope violation: Hard stop, explain, ask for confirmation

    MAX STEPS LIMIT: [N] steps before pausing and reporting

    Step 6 — Define the guardrails

    AGENT GUARDRAILS:
    
    

    The agent will NEVER:

    [ ] Take an irreversible action without showing the user what it's about to do

    [ ] Send communications without explicit per-send confirmation

    [ ] Access files outside the defined workspace

    [ ] Spend money without explicit authorisation per transaction

    [ ] Retain personal data beyond the defined session without consent

    [ ] Escalate its own permissions

    [ ] Override a user's explicit "stop" or "cancel" instruction

    Quality check before delivering

    Goal is defined with DONE / PARTIAL / FAILED states
    Every tool has a risk classification and confirmation rule
    Irreversible actions all require explicit user confirmation
    Max steps limit is defined
    Guardrails are absolute — not "the agent tries to avoid"
    Failure handling covers all four failure types
    Suggested next step: Walk through the agent loop manually with five real user scenarios — including one adversarial one. The gaps in your failure handling and guardrails will show up immediately.