Context & Tool Governance Checklist (2026)

Use this as a pre-flight before you ship (or refactor) an LLM feature. The goal: fewer moving parts, clearer contracts, and fast debugging.

1) Define the job
- Write the single sentence “job to be done” for the model (not the user story).
- List the top 3 failure modes that would be unacceptable (e.g., wrong entitlement, unauthorized disclosure, irreversible action).

2) Choose your source of truth
- For each required fact, mark it as: API/DB field, computed value, or prose document.
- If it’s API/DB: do not use retrieval as the first option. Plan a tool call.
- If it’s prose: decide whether you need citations. If yes, plan stable excerpt IDs.

3) Context assembly (deterministic)
- Specify allowed sources by type (e.g., Jira tickets, GitHub issues, internal policy pages).
- Define an ordering and structure (headers, sections, and a fixed template).
- Include provenance for every included snippet (URL/record ID + timestamp/version).
- Add a hard cap for context size and a rule for what gets dropped first.

4) Retrieval (only if it earns its keep)
- Define what retrieval is allowed to search (and what it must never search).
- Use metadata filters for permissions, but do not rely on them as your only control.
- Treat retrieved text as untrusted input; strip instruction-like patterns where possible.
- Create a small regression set of queries that must keep working as the corpus changes.

5) Tool contracts (strict)
- Every tool has: name, purpose, schema, and explicit permission checks outside the model.
- Validate tool inputs with a schema validator before execution.
- Prefer idempotent tools; for non-idempotent actions, require confirmation.
- Record tool outputs and errors in traces; never hide failures behind a natural-language apology.

6) Guardrails that actually work
- Add refusal criteria tied to missing required fields or insufficient evidence.
- Constrain outputs: structured JSON for actions, templated formats for reports.
- Separate “analysis” from “final answer” in your internal prompting; log both if your policy allows.

7) Tracing & auditability
- Log: assembled context, model config, prompt version, tool calls, tool outputs, and final response.
- Make a single run reconstructible from an ID.
- Define retention and redaction rules (PII, secrets, credentials).

8) Evaluation in CI
- Build a small suite: correctness checks, permission checks, injection tests, and refusal tests.
- Pin prompt/context templates by version; review changes like code.

Exit criteria (ship gate)
- You can explain one wrong answer end-to-end using traces.
- You can prove the model didn’t access unauthorized data.
- You can disable a tool or a data source without breaking the whole feature.