Agentic RAG Production Readiness Checklist (2026) Use this checklist before launching (or expanding) an agentic RAG feature into production. 1) Scope & success metrics - Define the workflow you’re automating (one persona, one job-to-be-done). - Set success metrics with numbers: task success rate %, deflection %, or time saved per ticket. - Set reliability thresholds: max severe error rate (e.g., <=2%) and required citation coverage (e.g., >=90%). 2) Data ingestion & indexing - Document sources inventoried (Drive/Confluence/Slack/GitHub/Jira/etc.). - Ingestion is structure-aware (headings, sections, tables when possible). - Every chunk includes: source URL/ID, updated_at, author/owner, and document type. - ACL metadata is attached at ingestion and enforced at query time. - Embedding model is versioned; you can re-embed and keep old vectors until evals pass. 3) Retrieval quality - Hybrid retrieval is implemented (lexical + dense) unless you can prove dense-only is sufficient. - Reranking is available and can be toggled per route/use case. - You track: top-k precision, context waste %, and “no-evidence” rate. - You have a clear policy for low-evidence responses (clarify vs refuse). 4) Generation constraints & verification - Output schema is typed (JSON) and validated. - Citations are mandatory per sentence (or per claim) for knowledge answers. - A citation validator checks: citation exists, points to retrieved chunk, and is relevant. - “Write” actions (creating tickets, refunds, approvals) require confirmation and idempotency keys. 5) Evaluation & release process - You maintain a labeled eval set (start with 50–100 tasks) tied to real workflows. - Evals run in CI on every change to: prompts, chunking, embeddings, reranker, tool schemas. - You can rollback retrieval configs and prompts the same way you rollback code. - You log traces for debugging: retrieved chunk IDs, tool calls, model route, latency breakdown. 6) Security, privacy, and audits - Permission-aware retrieval is tested with adversarial prompts (leak tests). - PII/PHI redaction is applied to logs and optionally to model inputs. - Trace retention meets enterprise requirements (e.g., 30–90 days) with access controls. - You can produce an audit report: who asked what, what sources were retrieved, what actions were taken. 7) Performance & cost - Latency budget is defined (interactive p95 target, e.g., <=5s). - Tool calls are cached where safe; slow APIs have timeouts and fallbacks. - You track cost per successful task (not just cost per request). - Model routing exists (small/fast model for simple steps; premium model only when needed). Exit criteria (launch-ready) - Eval gates pass for your critical workflow suite. - Leak tests pass; ACL enforcement verified. - Observability dashboards exist for quality, latency, tool failures, and spend. - On-call runbook exists: how to disable tools, disable rerank, or force “refuse” mode during incidents.