RAG + Agents Production Readiness Gate (paste into PRDs / launch checklists)

Goal: ship an AI feature (RAG or tool-using agent) that is explainable, permissioned, testable, and operable.

1) Scope & blast radius
- Define the workflow in one sentence (what action or decision does the system support?).
- List every external system the agent can touch (email, Slack, CRM, billing, ticketing, GitHub, databases).
- Decide the maximum allowed damage from a single bad run (e.g., “can draft but not send,” “can suggest but not execute”).

2) Provenance & content lifecycle
- Every chunk must map to: source system, stable document ID, version/revision, and timestamp.
- Decide canonical sources (what wins when duplicates conflict?).
- Set a deletion/archival policy: how you handle removed or superseded docs without breaking citations.
- Require citations in the UI for any non-trivial claim; citations must resolve to a stable URL or snapshot.

3) Identity & permissions (non-negotiable)
- Authenticate user at the edge; pass a real user identity into retrieval and tools.
- Enforce authorization at query time (filters/ACLs in retrieval), not post-generation.
- Never rely on the model to “refuse” as your only control.
- Log denied retrievals and blocked tool calls with reason codes.

4) Tool governance
- Maintain a tool allowlist per workflow (not global). Default is no tools.
- Make write tools idempotent; define safe retries and timeouts.
- Validate tool arguments (schema + business rules) before execution.
- Add per-user and per-agent quotas (tool-call caps, spend caps, rate limits).

5) Evaluation & release gating
- Build an eval dataset from real queries (scrubbed) plus adversarial cases.
- Track at least: retrieval relevance, groundedness, policy compliance, tool correctness.
- Run evals in CI on every change to prompts, retrieval settings, tool schemas, or model versions.
- Define failure policies: which regressions block merge vs create an issue.

6) Observability & incident response
- Trace every request: prompt version, model, retrieval results, tool calls, final output.
- Create an on-call owner and escalation path (product + engineering + security).
- Establish rollback: how to revert prompt/model/tool changes quickly.
- Add monitoring: latency, error rate, token spend, tool-call volume, refusal rate.

Definition of “ship-ready”
- You can answer: Who asked? What was retrieved? What permissions applied? What tools were called? Why did the system choose those sources? What changed since yesterday?
- You can reproduce a bad output from logs and replay it against a fixed snapshot.
- You have an eval gate that would have caught at least one realistic failure case.