Back to library

Multi-Agent System Requirements

A multi-agent system multiplies the power of a single agent — and multiplies its risks. When agents hand off work to each other, errors compound, context can be lost, and a failure in one agent can corrupt the downstream chain without anyone noticing. The PM's job in a multi-agent system is to define the contracts between agents as precisely as the agents themselves.

---

Context

Why use multiple agents (know the real reasons, not the hype):
Good reasonExplanation
SpecialisationDifferent tasks require different expertise, tools, or prompting strategies
ParallelismIndependent subtasks can run simultaneously to reduce total time
Context separationKeeping each agent's context focused improves output quality
ScaleOne orchestrator directing many workers handles volume a single agent can't
When NOT to use multiple agents:
  • The task can be completed by one agent in one context window
  • The overhead of orchestration and handoff would exceed the benefit
  • The task requires tight reasoning continuity that handoffs would interrupt
  • ---

    Step 1 — Define the system goal and task decomposition

    Map the user-facing goal, justify multi-agent choice, list subtasks with sequential and parallel dependencies.

    Step 2 — Design the orchestrator

    Define responsibilities (parse goal, assign subtasks, track progress, detect invalid results, re-route failures, assemble output), tools, and what the orchestrator does NOT do.

    Step 3 — Define each subagent

    For each subagent: name, role, assigned subtasks, tools, input contract, output contract (task_id, status, result schema, confidence, errors, notes), and scope limits.

    Step 4 — Design the handoff protocol

    Define handoff payload structure, context passing rules (pass only what's needed, self-contained context, orchestrator extracts relevant fields), and result validation checks.

    Step 5 — Define multi-agent failure modes

    Four failure types:

  • Subagent failure — retry with backoff, fallback strategy, then report to user
  • Cascading failure — orchestrator validates every result before passing downstream
  • Orchestrator failure — persist task state to log, reconstruct on restart
  • Goal drift — goal alignment check step before returning final output
  • System-level guardrails: max total steps, max spend, max time, irreversible actions require user confirmation.

    Quality check before delivering

    Task decomposition shows sequential vs. parallel subtasks
    Every subagent has explicit input AND output contracts
    Result validation is defined
    Cascading failure is addressed
    System-level guardrails include a max step limit
    All irreversible actions require user confirmation
    Suggested next step: Before building the full system, prototype the orchestrator with stub subagents that return synthetic results. This validates the handoff contracts and failure handling without the cost of building all subagents first.