Multi-Agent System Requirements

A multi-agent system multiplies the power of a single agent — and multiplies its risks. When agents hand off work to each other, errors compound, context can be lost, and a failure in one agent can corrupt the downstream chain without anyone noticing. The PM's job in a multi-agent system is to define the contracts between agents as precisely as the agents themselves.

---

Context

Why use multiple agents (know the real reasons, not the hype):

Good reason	Explanation
Specialisation	Different tasks require different expertise, tools, or prompting strategies
Parallelism	Independent subtasks can run simultaneously to reduce total time
Context separation	Keeping each agent's context focused improves output quality
Scale	One orchestrator directing many workers handles volume a single agent can't

When NOT to use multiple agents:

The task can be completed by one agent in one context window

The overhead of orchestration and handoff would exceed the benefit

The task requires tight reasoning continuity that handoffs would interrupt

---

Step 1 — Define the system goal and task decomposition

Map the user-facing goal, justify multi-agent choice, list subtasks with sequential and parallel dependencies.

Step 2 — Design the orchestrator

Define responsibilities (parse goal, assign subtasks, track progress, detect invalid results, re-route failures, assemble output), tools, and what the orchestrator does NOT do.

Step 3 — Define each subagent

For each subagent: name, role, assigned subtasks, tools, input contract, output contract (task_id, status, result schema, confidence, errors, notes), and scope limits.

Step 4 — Design the handoff protocol

Define handoff payload structure, context passing rules (pass only what's needed, self-contained context, orchestrator extracts relevant fields), and result validation checks.

Step 5 — Define multi-agent failure modes

Four failure types:

Subagent failure — retry with backoff, fallback strategy, then report to user

Cascading failure — orchestrator validates every result before passing downstream

Orchestrator failure — persist task state to log, reconstruct on restart

Goal drift — goal alignment check step before returning final output

System-level guardrails: max total steps, max spend, max time, irreversible actions require user confirmation.

Quality check before delivering

Task decomposition shows sequential vs. parallel subtasks

Every subagent has explicit input AND output contracts

Result validation is defined

Cascading failure is addressed

System-level guardrails include a max step limit

All irreversible actions require user confirmation

Suggested next step: Before building the full system, prototype the orchestrator with stub subagents that return synthetic results. This validates the handoff contracts and failure handling without the cost of building all subagents first.