AI Safety Requirements

AI safety is the set of controls that prevent an AI system from causing unintended harm — to users, third parties, or broader society. For AI PMs, safety is not a philosophical discussion. It's a requirement: what specific controls must be in place before this feature ships, and how do we verify they're working? This skill identifies the safety requirements specific to your feature and turns them into implementable product specifications.

---

Context

AI safety risk categories (know which apply to your feature):

Risk category	Description	Example
Direct harm	The AI causes direct physical, psychological, or financial harm to a user	Medical AI gives dangerous advice; financial AI recommends a harmful action
Facilitated harm	The AI enables a user to harm others	AI helps generate targeted harassment; AI provides weapons instructions
Runaway behaviour	The AI takes unintended autonomous actions with real-world consequences	Agent deletes files the user didn't intend to delete; agent sends emails without permission
Manipulation	The AI influences user beliefs or behaviour in ways that undermine their autonomy	AI optimises for engagement in ways that exploit psychological vulnerabilities
Systemic harm	Aggregate AI behaviour causes harm at scale even if individual outputs seem benign	AI recommendation system that systematically amplifies polarising content

The safety design principle:

Safety controls should be designed for the worst realistic user, not the average user.

---

Step 1 — Identify the safety risk profile

Ask:

What could this AI feature do that would directly harm a user?

What could this AI feature do that would enable a user to harm others?

If the AI behaves autonomously — what actions could it take without the user noticing?

What is the worst single output this feature could produce?

What is the worst aggregate behaviour at scale?

Who are the most vulnerable users?

Step 2 — Apply the safety control stack

Control 1 — Output content filters

Define prohibited content categories and implement multi-layer filtering: system prompt instruction, output validation, and user reporting.

Control 2 — Vulnerable user protections

Define which vulnerable populations might use this feature and apply appropriate protections including crisis detection for consumer-facing features.

Control 3 — Agentic safety controls

For agentic features: irreversibility controls, blast radius limitation, intent verification, and override/abort mechanisms.

Control 4 — Anti-manipulation controls

The AI must not use urgency tactics, fabricate social proof, form parasocial relationships, or exploit psychological vulnerabilities.

Control 5 — Systemic safety monitoring

Monthly aggregate behaviour review, misuse pattern detection, feedback loop audit, and third-party impact assessment.

Step 3 — Output the AI safety requirements document

Include: safety risk profile, all applicable safety controls, pre-launch verification checklist, and residual risks accepted.

Quality check before delivering

Risk profile identifies worst single output AND worst aggregate behaviour

Content filter categories are specific to this feature's domain

Crisis detection is included for any consumer-facing feature

Agentic safety controls are included if the feature has autonomous behaviour

Residual risks are honest — no safety plan eliminates all risk

Pre-launch verification is actionable

Suggested next step: Test the crisis detection pathway before any other safety control. It has the highest stakes and the hardest edge cases.