Back to library

AI Product-Market Fit Diagnosis

AI product-market fit is harder to assess than traditional PMF because AI features can feel impressive and still not be useful. Users will try an AI feature once if it's new and shiny — and not come back if it doesn't reliably solve a real problem.

Context

The three failure modes that kill AI PMF:

Failure modeWhat it looks likeRoot cause
Impressive but not usefulUsers demo it, don't use it in real workNo real job to be done; or easier to do manually
Useful but not trustworthyUsers spend more time verifying than AI savedQuality too inconsistent; failure too costly
Trustworthy but not habitualUsers like it but don't build it into workflowNo activation moment; no daily use case

Step 1 — Run the AI PMF diagnostic

ADOPTION METRICS:
  • % tried at least once: [N]%
  • % used in week 2: [N]%
  • % weekly after 4 weeks: [N]%
  • QUALITY METRICS:

  • Output acceptance rate: [N]%
  • Re-query rate: [N]%
  • User satisfaction: [N]%
  • BEHAVIOUR SHIFT METRICS:

  • Completing task faster with AI? [Yes/No/Unknown]
  • Completing task more often? [Yes/No/Unknown]
  • Time-to-action after AI output: [N seconds avg]
  • Step 2 — Apply the AI PMF rubric (score 1–3 each)

  • Real job to be done: How frequent and important is the task?
  • Output quality: Output acceptance rate (<50% / 50–75% / >75%)
  • Trust and reliability: Do users verify every output or act directly?
  • Habit formation: Usage retention at week 4 (<20% / 20–50% / >50%)
  • Switching cost: How much would users lose by switching?
  • Total score out of 15:
  • 13–15: Strong AI PMF — optimise and scale
  • 9–12: Directional — fix lowest dimension
  • 5–8: Pre-PMF — deep qualitative research needed
  • <5: Fundamental rethink needed
  • Step 3 — Diagnose PMF blockers

    For each dimension scoring 1 or 2:

  • Low JTBD: Run 5 user interviews. If "just exploring" → no real JTBD
  • Low quality: Pull 50 rejected outputs, categorise failure patterns
  • Low trust: Is it random (model inconsistency) or systematic (wrong task)?
  • Low habit: Map the activation moment. If none exists, design one
  • Low switching cost: Add personalisation, integrate with existing tools
  • Step 4 — Run the Sean Ellis test for AI

    Survey users who've used the feature ≥3 times:

  • "How disappointed would you be if you could no longer use this?"
  • PMF threshold: ≥40% "Very disappointed"
  • If below threshold but one segment is above: that's your beachhead
  • Quality check before delivering

    Diagnostic includes actual metrics — not estimates alone
    PMF rubric scored honestly — no dimension inflated
    Blockers are specific — not "improve quality"
    Sean Ellis threshold (40%) stated
    Intervention priority driven by lowest rubric score
    Review date set — PMF is dynamic
    Suggested next step: Find your "very disappointed" users. Interview 5 this week and ask: "Walk me through exactly how you use this feature."