Production LLM System Readiness Checklist (2026) Use this as a go/no-go gate before you expand an LLM feature beyond a pilot. 1) Scope and action surface - Define the primary job-to-be-done in one sentence. - List the exact tools/actions the system is allowed to call (aim for 5–15). - For each tool, write: purpose, required inputs, expected outputs, and failure modes. - Write a “cannot do” list (money movement, access changes, production deploys) and required approval paths. 2) Retrieval quality (ground truth first) - Identify canonical sources (e.g., Confluence for policies, GitHub for code, Jira for incidents). - Assign an owner per source who is accountable for freshness. - Choose retrieval strategy explicitly: semantic, keyword, or hybrid; define metadata filters. - Implement ACL-aware retrieval (user permissions must constrain results). - Create a small gold set of queries with expected documents (not expected answers) and review weekly. 3) Tool safety and API design - Tools are narrow and typed (schema validation before execution). - Tools are idempotent (retries don’t duplicate side effects). - Add rate limits and per-user quotas. - Sensitive tools require human approval or step-up authentication. - Every tool call is logged with input, output, user identity, and timestamp. 4) Observability and auditability - Log: user prompt, system prompt version, retrieved doc IDs, model output, tool calls. - Store enough context to reproduce a decision during an incident. - Define retention and access to logs (who can see what, for how long). - Add dashboards for tool error rates and refusal rates. 5) Evaluation and release discipline - Maintain a fixed evaluation set covering: common tasks, edge cases, prompt injection attempts. - Track retrieval metrics separately from generation (did it fetch the right doc?). - Run evals on every prompt/tool change (CI gate or scheduled job). - Use staged rollout (internal, small cohort, wider release) with rollback. 6) Security and compliance alignment - Document data flow: what leaves your system, what’s stored, where. - Confirm vendor terms for data handling match your requirements. - Ensure SSO, RBAC, and least-privilege are enforced end-to-end. - Create an incident runbook: kill switch, comms owner, investigation steps. Go/No-Go rule of thumb: If you can’t answer “what data did it use, what action did it take, and was it allowed,” it’s a No-Go for production.