RAG + Agents Production Readiness Gate (paste into PRDs / launch checklists) Goal: ship an AI feature (RAG or tool-using agent) that is explainable, permissioned, testable, and operable. 1) Scope & blast radius - Define the workflow in one sentence (what action or decision does the system support?). - List every external system the agent can touch (email, Slack, CRM, billing, ticketing, GitHub, databases). - Decide the maximum allowed damage from a single bad run (e.g., “can draft but not send,” “can suggest but not execute”). 2) Provenance & content lifecycle - Every chunk must map to: source system, stable document ID, version/revision, and timestamp. - Decide canonical sources (what wins when duplicates conflict?). - Set a deletion/archival policy: how you handle removed or superseded docs without breaking citations. - Require citations in the UI for any non-trivial claim; citations must resolve to a stable URL or snapshot. 3) Identity & permissions (non-negotiable) - Authenticate user at the edge; pass a real user identity into retrieval and tools. - Enforce authorization at query time (filters/ACLs in retrieval), not post-generation. - Never rely on the model to “refuse” as your only control. - Log denied retrievals and blocked tool calls with reason codes. 4) Tool governance - Maintain a tool allowlist per workflow (not global). Default is no tools. - Make write tools idempotent; define safe retries and timeouts. - Validate tool arguments (schema + business rules) before execution. - Add per-user and per-agent quotas (tool-call caps, spend caps, rate limits). 5) Evaluation & release gating - Build an eval dataset from real queries (scrubbed) plus adversarial cases. - Track at least: retrieval relevance, groundedness, policy compliance, tool correctness. - Run evals in CI on every change to prompts, retrieval settings, tool schemas, or model versions. - Define failure policies: which regressions block merge vs create an issue. 6) Observability & incident response - Trace every request: prompt version, model, retrieval results, tool calls, final output. - Create an on-call owner and escalation path (product + engineering + security). - Establish rollback: how to revert prompt/model/tool changes quickly. - Add monitoring: latency, error rate, token spend, tool-call volume, refusal rate. Definition of “ship-ready” - You can answer: Who asked? What was retrieved? What permissions applied? What tools were called? Why did the system choose those sources? What changed since yesterday? - You can reproduce a bad output from logs and replay it against a fixed snapshot. - You have an eval gate that would have caught at least one realistic failure case.