Back to library

Hallucination Mitigation

Hallucination is not a bug to be fixed — it's a property of probabilistic language models to be managed. Every AI feature carries some hallucination risk. The PM's job is to assess that risk before shipping, design safeguards proportional to the stakes, and build monitoring so hallucinations in production are caught before users are harmed.

---

Context

Types of hallucination:
TypeDescriptionExample
Factual fabricationInvents facts not in the input or training dataCites a study that doesn't exist
Source confabulationMisattributes real information to the wrong sourceCorrect fact, wrong citation
Input hallucinationInvents details not present in the provided inputSummarises a point the document didn't make
Temporal hallucinationPresents outdated information as currentStates a feature exists that was deprecated
Instruction hallucinationIgnores a constraint and behaves as if it wasn't givenProduces output outside the defined scope
The stakes framework:
Stakes levelExampleRequired mitigation
CriticalMedical, legal, financial adviceHuman review of every output; refusal rather than uncertainty
HighCustomer-facing factual claimsCitation required; confidence threshold enforced
MediumInternal summaries, draft contentSpot-check audit; user flagging mechanism
LowBrainstorming, ideation supportDisclaimer; user awareness

---

Step 1 — Assess the hallucination risk profile

Ask: what does the feature do, source of truth, most dangerous hallucination type, stakes level, audience, and human review step.

Step 2 — Apply the mitigations

Six mitigation techniques:

  • Ground the model — provide source of truth and instruct to stay within it
  • Require citations — every factual claim must cite a specific source
  • Enforce confidence flagging — model flags its own uncertainty
  • Temperature control — 0–0.2 for factual tasks, 0.5–0.8 for creative
  • Output validation — second model call checks for hallucination (Critical/High only)
  • User-facing disclosure — specific, stakes-appropriate disclaimer text
  • Step 3 — Build the hallucination monitoring plan

    Detection methods: user flagging, automated validation, periodic human audit, automated fact-check. Alert thresholds and response protocol with SLA.

    Quality check before delivering

    Risk profile includes a stakes level
    Mitigations are matched to the stakes level
    Prompt additions are specific and copyable
    Output validation is included for Critical and High stakes
    User-facing disclosure text is written and specific
    Monitoring plan has a defined owner, threshold, and SLA
    Residual risk is honestly stated
    Suggested next step: Share the prompt changes and engineering requirements with your engineering lead before the next sprint. Hallucination mitigations added after launch cost 3–5x more than those designed in from the start.