Context vs Retrieval vs Tools — 2026 Decision Checklist

Use this checklist before you add a vector database or rebuild your assistant stack. The goal is to pick the simplest primitive that can be operated, audited, and improved.

1) Name the source of truth (no exceptions)
- For every answer type, write one authoritative source: a DB table, an API, a policy doc (with owner), a contract template version, a runbook.
- If you can’t name it, you can’t promise correctness. Treat the feature as “drafting” or “suggestion,” not “answering.”

2) Classify the user’s intent
- Search intent: user is trying to discover documents/clauses/threads. Retrieval is likely.
- Synthesis intent: user wants a decision or summary over a bounded set (one incident, one account, one PR). Prefer a context packet.
- Operational intent: user wants current state or an action (status, totals, create/update). Prefer tool calls to systems of record.

3) Choose the primitive
A) Tool calls first if:
- The truth lives in systems of record (billing, CRM, ticketing, observability).
- You can define a typed schema for inputs/outputs.
- You can enforce auth and log every call.

B) Context packet first if:
- You can deterministically gather the relevant artifacts per request.
- The corpus is bounded (repo, customer workspace, incident channel).
- You can define truncation and priority rules.

C) Retrieval (RAG) if:
- The corpus is large and messy, and the product experience is discovery.
- You can staff relevance work: chunking strategy, metadata, hybrid signals, reranking, evals.
- You can implement ACL-aware filtering end-to-end (docs → chunks → embeddings → query).

D) Fine-tuning/adapters if:
- The task is stable formatting/classification/extraction.
- You have curated examples and a plan for drift.
- You still fetch fresh facts via tools or explicit context.

4) Minimum logging required (ship-blockers)
- Context packet: log IDs + versions of included artifacts and what got dropped.
- RAG: log query, retrieved doc IDs, scores (if available), reranker inputs/outputs.
- Tools: log tool name, parameters, caller identity, authorization scope, tool response.
- Final response: log structured output and citations/references.

5) Evals you must run before rollout
- Stale policy test: conflicting versions of a doc.
- Permission test: user asks for something they should not access.
- Ambiguity test: missing identifier (invoice ID, customer ID) should trigger a clarifying question.
- Regression test: same prompt/data should not flip answers after a content update.

If you can’t meet the logging + eval bar, do not add more complexity. Reduce scope, tighten contracts, and ship a smaller assistant you can actually operate.