Agentic RAG Production Readiness Checklist (2026)

Use this checklist before launching (or expanding) an agentic RAG feature into production.

1) Scope & success metrics
- Define the workflow you’re automating (one persona, one job-to-be-done).
- Set success metrics with numbers: task success rate %, deflection %, or time saved per ticket.
- Set reliability thresholds: max severe error rate (e.g., <=2%) and required citation coverage (e.g., >=90%).

2) Data ingestion & indexing
- Document sources inventoried (Drive/Confluence/Slack/GitHub/Jira/etc.).
- Ingestion is structure-aware (headings, sections, tables when possible).
- Every chunk includes: source URL/ID, updated_at, author/owner, and document type.
- ACL metadata is attached at ingestion and enforced at query time.
- Embedding model is versioned; you can re-embed and keep old vectors until evals pass.

3) Retrieval quality
- Hybrid retrieval is implemented (lexical + dense) unless you can prove dense-only is sufficient.
- Reranking is available and can be toggled per route/use case.
- You track: top-k precision, context waste %, and “no-evidence” rate.
- You have a clear policy for low-evidence responses (clarify vs refuse).

4) Generation constraints & verification
- Output schema is typed (JSON) and validated.
- Citations are mandatory per sentence (or per claim) for knowledge answers.
- A citation validator checks: citation exists, points to retrieved chunk, and is relevant.
- “Write” actions (creating tickets, refunds, approvals) require confirmation and idempotency keys.

5) Evaluation & release process
- You maintain a labeled eval set (start with 50–100 tasks) tied to real workflows.
- Evals run in CI on every change to: prompts, chunking, embeddings, reranker, tool schemas.
- You can rollback retrieval configs and prompts the same way you rollback code.
- You log traces for debugging: retrieved chunk IDs, tool calls, model route, latency breakdown.

6) Security, privacy, and audits
- Permission-aware retrieval is tested with adversarial prompts (leak tests).
- PII/PHI redaction is applied to logs and optionally to model inputs.
- Trace retention meets enterprise requirements (e.g., 30–90 days) with access controls.
- You can produce an audit report: who asked what, what sources were retrieved, what actions were taken.

7) Performance & cost
- Latency budget is defined (interactive p95 target, e.g., <=5s).
- Tool calls are cached where safe; slow APIs have timeouts and fallbacks.
- You track cost per successful task (not just cost per request).
- Model routing exists (small/fast model for simple steps; premium model only when needed).

Exit criteria (launch-ready)
- Eval gates pass for your critical workflow suite.
- Leak tests pass; ACL enforcement verified.
- Observability dashboards exist for quality, latency, tool failures, and spend.
- On-call runbook exists: how to disable tools, disable rerank, or force “refuse” mode during incidents.