Hallucination Mitigation

Hallucination is not a bug to be fixed — it's a property of probabilistic language models to be managed. Every AI feature carries some hallucination risk. The PM's job is to assess that risk before shipping, design safeguards proportional to the stakes, and build monitoring so hallucinations in production are caught before users are harmed.

---

Context

Types of hallucination:

Type	Description	Example
Factual fabrication	Invents facts not in the input or training data	Cites a study that doesn't exist
Source confabulation	Misattributes real information to the wrong source	Correct fact, wrong citation
Input hallucination	Invents details not present in the provided input	Summarises a point the document didn't make
Temporal hallucination	Presents outdated information as current	States a feature exists that was deprecated
Instruction hallucination	Ignores a constraint and behaves as if it wasn't given	Produces output outside the defined scope

The stakes framework:

Stakes level	Example	Required mitigation
Critical	Medical, legal, financial advice	Human review of every output; refusal rather than uncertainty
High	Customer-facing factual claims	Citation required; confidence threshold enforced
Medium	Internal summaries, draft content	Spot-check audit; user flagging mechanism
Low	Brainstorming, ideation support	Disclaimer; user awareness

---

Step 1 — Assess the hallucination risk profile

Ask: what does the feature do, source of truth, most dangerous hallucination type, stakes level, audience, and human review step.

Step 2 — Apply the mitigations

Six mitigation techniques:

Ground the model — provide source of truth and instruct to stay within it

Require citations — every factual claim must cite a specific source

Enforce confidence flagging — model flags its own uncertainty

Temperature control — 0–0.2 for factual tasks, 0.5–0.8 for creative

Output validation — second model call checks for hallucination (Critical/High only)

User-facing disclosure — specific, stakes-appropriate disclaimer text

Step 3 — Build the hallucination monitoring plan

Detection methods: user flagging, automated validation, periodic human audit, automated fact-check. Alert thresholds and response protocol with SLA.

Quality check before delivering

Risk profile includes a stakes level

Mitigations are matched to the stakes level

Prompt additions are specific and copyable

Output validation is included for Critical and High stakes

User-facing disclosure text is written and specific

Monitoring plan has a defined owner, threshold, and SLA

Residual risk is honestly stated

Suggested next step: Share the prompt changes and engineering requirements with your engineering lead before the next sprint. Hallucination mitigations added after launch cost 3–5x more than those designed in from the start.