Back to library

Prompting Strategy Selector

Choosing the wrong prompting strategy wastes engineering time and produces worse outputs. The right choice depends on what you're asking the model to do, what data you have, and what quality and cost constraints you're working within.

---

Context

The four main strategies:
StrategyWhat it isWhen it works
Zero-shotGive the model a task with no examplesSimple, well-defined tasks within the model's training
Few-shotGive the model 2–5 examples of the desired input/outputTasks where format, tone, or domain-specific output matters
Chain-of-thought (CoT)Instruct the model to reason step by stepComplex reasoning, multi-step decisions
RAGRetrieve relevant documents and include in contextTasks requiring up-to-date, specific, or proprietary knowledge
Additional approaches: Fine-tuning (only when zero/few-shot + RAG consistently underperform with large labelled dataset) and System prompt engineering (always — baseline, not alternative).

---

Step 1 — Understand the task requirements

Assess: task type, data needed, output consistency requirements, volume, cost tolerance, and availability of labelled examples.

Step 2 — Run the decision tree

  • Q1: Requires specific documents or proprietary knowledge? → RAG
  • Q2: Highly specific output format? → Few-shot
  • Q3: Multi-step reasoning? → Chain-of-thought
  • Q4: Simple, well-defined, within training knowledge? → Zero-shot
  • Step 3 — Deep-dive on each strategy

    Zero-shot: Best for simple tasks. Failure signal: inconsistent quality → move to few-shot. Few-shot: 3–5 diverse examples as input/output pairs. Place examples BEFORE the task. Failure signal: model follows some examples but not others → add diversity or consider fine-tuning. Chain-of-thought: Add "Think through this step by step." Separate reasoning from output. Don't use for simple classification/extraction tasks (adds latency). Failure signal: correct reasoning but wrong conclusion → add checkpoints. RAG: Retrieve 3–5 chunks per query with similarity threshold. Define fallback for no relevant chunks. Failure signal: model ignores context → strengthen grounding instruction.

    Step 4 — Compare strategies for the specific task

    Run a comparison across: quality, consistency, setup time, ongoing cost, latency, knowledge freshness, hallucination risk.

    Quality check before delivering

    Decision tree was followed — not a generic recommendation
    Comparison table is filled in for the specific task
    Recommendation includes a starter prompt to test
    "When to reconsider" criteria are specific
    Fine-tuning is only recommended if 100+ labelled examples exist
    Suggested next step: Build and test the starter prompt with 5–10 real inputs before briefing engineering. The strategy is a hypothesis until it's tested.