Production LLM System Readiness Checklist (2026)

Use this as a go/no-go gate before you expand an LLM feature beyond a pilot.

1) Scope and action surface
- Define the primary job-to-be-done in one sentence.
- List the exact tools/actions the system is allowed to call (aim for 5–15).
- For each tool, write: purpose, required inputs, expected outputs, and failure modes.
- Write a “cannot do” list (money movement, access changes, production deploys) and required approval paths.

2) Retrieval quality (ground truth first)
- Identify canonical sources (e.g., Confluence for policies, GitHub for code, Jira for incidents).
- Assign an owner per source who is accountable for freshness.
- Choose retrieval strategy explicitly: semantic, keyword, or hybrid; define metadata filters.
- Implement ACL-aware retrieval (user permissions must constrain results).
- Create a small gold set of queries with expected documents (not expected answers) and review weekly.

3) Tool safety and API design
- Tools are narrow and typed (schema validation before execution).
- Tools are idempotent (retries don’t duplicate side effects).
- Add rate limits and per-user quotas.
- Sensitive tools require human approval or step-up authentication.
- Every tool call is logged with input, output, user identity, and timestamp.

4) Observability and auditability
- Log: user prompt, system prompt version, retrieved doc IDs, model output, tool calls.
- Store enough context to reproduce a decision during an incident.
- Define retention and access to logs (who can see what, for how long).
- Add dashboards for tool error rates and refusal rates.

5) Evaluation and release discipline
- Maintain a fixed evaluation set covering: common tasks, edge cases, prompt injection attempts.
- Track retrieval metrics separately from generation (did it fetch the right doc?).
- Run evals on every prompt/tool change (CI gate or scheduled job).
- Use staged rollout (internal, small cohort, wider release) with rollback.

6) Security and compliance alignment
- Document data flow: what leaves your system, what’s stored, where.
- Confirm vendor terms for data handling match your requirements.
- Ensure SSO, RBAC, and least-privilege are enforced end-to-end.
- Create an incident runbook: kill switch, comms owner, investigation steps.

Go/No-Go rule of thumb:
If you can’t answer “what data did it use, what action did it take, and was it allowed,” it’s a No-Go for production.