AI Safety Requirements
AI safety is the set of controls that prevent an AI system from causing unintended harm — to users, third parties, or broader society. For AI PMs, safety is not a philosophical discussion. It's a requirement: what specific controls must be in place before this feature ships, and how do we verify they're working? This skill identifies the safety requirements specific to your feature and turns them into implementable product specifications.
---
Context
AI safety risk categories (know which apply to your feature):| Risk category | Description | Example |
|---|---|---|
| Direct harm | The AI causes direct physical, psychological, or financial harm to a user | Medical AI gives dangerous advice; financial AI recommends a harmful action |
| Facilitated harm | The AI enables a user to harm others | AI helps generate targeted harassment; AI provides weapons instructions |
| Runaway behaviour | The AI takes unintended autonomous actions with real-world consequences | Agent deletes files the user didn't intend to delete; agent sends emails without permission |
| Manipulation | The AI influences user beliefs or behaviour in ways that undermine their autonomy | AI optimises for engagement in ways that exploit psychological vulnerabilities |
| Systemic harm | Aggregate AI behaviour causes harm at scale even if individual outputs seem benign | AI recommendation system that systematically amplifies polarising content |
Safety controls should be designed for the worst realistic user, not the average user.
---
Step 1 — Identify the safety risk profile
Ask:
Step 2 — Apply the safety control stack
Control 1 — Output content filters
Define prohibited content categories and implement multi-layer filtering: system prompt instruction, output validation, and user reporting.
Control 2 — Vulnerable user protections
Define which vulnerable populations might use this feature and apply appropriate protections including crisis detection for consumer-facing features.
Control 3 — Agentic safety controls
For agentic features: irreversibility controls, blast radius limitation, intent verification, and override/abort mechanisms.
Control 4 — Anti-manipulation controls
The AI must not use urgency tactics, fabricate social proof, form parasocial relationships, or exploit psychological vulnerabilities.
Control 5 — Systemic safety monitoring
Monthly aggregate behaviour review, misuse pattern detection, feedback loop audit, and third-party impact assessment.
Step 3 — Output the AI safety requirements document
Include: safety risk profile, all applicable safety controls, pre-launch verification checklist, and residual risks accepted.