Semantic Search Design

Keyword search fails when users don't know the exact words. Semantic search closes this gap — it finds content based on meaning, not lexical overlap. This skill designs the semantic search system: the query pipeline, the indexing strategy, the ranking logic, and the quality metrics.

---

Context

Keyword vs. semantic vs. hybrid search:

Type	How it works	Best for
Keyword (BM25)	Finds documents containing the query terms	Known-item search, exact terminology
Semantic	Finds documents by meaning similarity	Exploratory search, natural language queries
Hybrid	Combines both scores	Most production search systems — best of both

For most products, hybrid is the right architecture.

---

Step 1 — Define the search context

Assess: content being searched, user type, query style, good result definition, current search approach, and primary failure mode.

Step 2 — Design the query pipeline

Five steps: Query understanding (spell correction, expansion, intent classification) → Query embedding → Retrieval (semantic top-K + keyword BM25 in parallel) → Result fusion (Reciprocal Rank Fusion or weighted score) → Optional re-ranking (cross-encoder for top 20 candidates).

Step 3 — Design the indexing strategy

Define: what gets embedded (combine title + summary), chunking strategy for long documents, metadata fields for filtering, index freshness requirements, and re-index triggers.

Step 4 — Define the search UX requirements

Query input, result display (cards with highlighted snippets), zero results handling, low confidence handling, and search analytics instrumentation.

Step 5 — Define search quality metrics

Retrieval quality: MRR > 0.7, NDCG@10 > 0.75 on labelled query set. Usage quality: zero-results rate < 15%, re-query rate < 25%, click-through rate > 40%, mean click position < 2.5. Latency: p95 < 150ms.

Quality check before delivering

Hybrid search is recommended — not semantic-only

Zero-results handling is specified

Labelled eval set creation is part of the plan

Latency budget is split across pipeline stages

Re-indexing trigger on embedding model change is explicit

Suggested next step: Build the labelled evaluation set before writing any code. Collect 50–100 real queries and manually judge relevance. Build the ruler before you start measuring.