Prompting Strategy Selector

Choosing the wrong prompting strategy wastes engineering time and produces worse outputs. The right choice depends on what you're asking the model to do, what data you have, and what quality and cost constraints you're working within.

---

Context

The four main strategies:

Strategy	What it is	When it works
Zero-shot	Give the model a task with no examples	Simple, well-defined tasks within the model's training
Few-shot	Give the model 2–5 examples of the desired input/output	Tasks where format, tone, or domain-specific output matters
Chain-of-thought (CoT)	Instruct the model to reason step by step	Complex reasoning, multi-step decisions
RAG	Retrieve relevant documents and include in context	Tasks requiring up-to-date, specific, or proprietary knowledge

Additional approaches: Fine-tuning (only when zero/few-shot + RAG consistently underperform with large labelled dataset) and System prompt engineering (always — baseline, not alternative).

---

Step 1 — Understand the task requirements

Assess: task type, data needed, output consistency requirements, volume, cost tolerance, and availability of labelled examples.

Step 2 — Run the decision tree

Q1: Requires specific documents or proprietary knowledge? → RAG

Q2: Highly specific output format? → Few-shot

Q3: Multi-step reasoning? → Chain-of-thought

Q4: Simple, well-defined, within training knowledge? → Zero-shot

Step 3 — Deep-dive on each strategy

Zero-shot: Best for simple tasks. Failure signal: inconsistent quality → move to few-shot. Few-shot: 3–5 diverse examples as input/output pairs. Place examples BEFORE the task. Failure signal: model follows some examples but not others → add diversity or consider fine-tuning. Chain-of-thought: Add "Think through this step by step." Separate reasoning from output. Don't use for simple classification/extraction tasks (adds latency). Failure signal: correct reasoning but wrong conclusion → add checkpoints. RAG: Retrieve 3–5 chunks per query with similarity threshold. Define fallback for no relevant chunks. Failure signal: model ignores context → strengthen grounding instruction.

Step 4 — Compare strategies for the specific task

Run a comparison across: quality, consistency, setup time, ongoing cost, latency, knowledge freshness, hallucination risk.

Quality check before delivering

Decision tree was followed — not a generic recommendation

Comparison table is filled in for the specific task

Recommendation includes a starter prompt to test

"When to reconsider" criteria are specific

Fine-tuning is only recommended if 100+ labelled examples exist

Suggested next step: Build and test the starter prompt with 5–10 real inputs before briefing engineering. The strategy is a hypothesis until it's tested.