Confidence Scoring

An AI that always sounds certain is a liability. Users who can't tell the difference between a confident accurate output and a confident hallucination will eventually trust the AI when they shouldn't — and the consequences scale with the stakes. Confidence scoring is the PM mechanism for giving users the information they need to decide when to act on AI output and when to verify it. This skill designs the scoring system, the UX, and the logic that drives both.

---

Context

What confidence scoring is not:

Confidence scoring is not just exposing the model's raw probability scores. Softmax probabilities are poorly calibrated and meaningless to non-technical users. Confidence scoring is a product design decision.

The three signals that drive confidence:

Signal	How it works	Best for
Retrieval confidence	How relevant were the retrieved source documents? (RAG features)	Q&A, document search, knowledge bases
Self-assessed uncertainty	Prompt the model to flag its own uncertainty	General-purpose AI, free-text generation
Downstream validation	A second model call checks the output for errors	High-stakes factual features

The product design principle: Show confidence at the output level, not the session level.

---

Step 1 — Define the confidence scoring context

Ask:

What does the AI feature output?

What is the stakes level?

Does the feature use RAG?

What action does the user take based on the AI output?

What does the user do when they don't trust an output?

What is the user's technical sophistication?

Step 2 — Choose the confidence scoring approach

Approach A — Retrieval-based confidence (for RAG features)

Score relevance using cosine similarity thresholds: High ≥ 0.85, Medium 0.70–0.84, Low < 0.70.

Approach B — Self-assessed uncertainty (for general features)

Prompt the model to evaluate its own confidence as HIGH/MEDIUM/LOW with reasoning.

Approach C — Downstream validation scoring (for Critical/High stakes)

A second model call checks for factual errors. Adds latency and cost.

Step 3 — Design the confidence UX

Four patterns: colour-coded signal, inline disclaimer, source citation display, or refusal threshold.

UX anti-patterns to avoid: Showing percentage scores, using "I think" without a systematic signal, showing confidence only in settings.

Step 4 — Define the calibration process

Collect 100 outputs, human-rate accuracy, build calibration table. A well-calibrated system: HIGH ≥ 90% accurate, MEDIUM 60–89%, LOW < 60%. Recalibrate on model/prompt changes and distribution shifts.

Quality check before delivering

Scoring approach matched to feature type

Thresholds have numeric values

Fallback behaviour for LOW confidence is specified

UX pattern includes exact copy

Calibration process is defined

Known limitations are stated

Suggested next step: Before implementing, run a manual calibration pass. Take 50 recent outputs, score them, and have a colleague rate them for accuracy. Fix the signal before you design the display.