Agentic RAG 2.0 Launch Checklist (90-Day Plan) Goal: Ship one production workflow (not a generic chatbot) that is grounded, measurable, and safe. Use this as a 90-day plan. 1) Pick the workflow + KPI (Week 1) - Choose ONE workflow with a clear owner (Support, SOC, Sales Ops, Finance Ops). - Define a single primary KPI: e.g., reduce handle time 15%, increase first-contact resolution 10%, cut onboarding time from 10 to 7 days. - Define “stop conditions”: what actions the agent must never take without approval (money movement, permission changes, outbound email). 2) Build the gold dataset (Weeks 1–3) - Collect 200–1,000 real historical cases. - For each case, store: user question, correct outcome, required citations (doc URLs/IDs), required tool actions (if any), and “unacceptable outputs.” - Redact PII/secrets and document your redaction approach. 3) Retrieval architecture (Weeks 2–5) - List corpora: policies, product docs, runbooks, tickets, knowledge base, CRM notes. - For each source, define freshness SLA (e.g., 15 minutes for pricing pages; daily for handbook). - Implement hybrid retrieval where identifiers matter (keyword + vectors + reranker). - Add metadata filters (tenant, product, region, version) to prevent cross-customer leakage. 4) Routing + memory (Weeks 4–7) - Add an intent classifier to choose retrieval/tools per request. - Implement scoped memory: - Session state: ephemeral, expires quickly. - Long-term profile: structured fields, user-consented, deletable. - Avoid storing policy text as memory; keep it in retrieval. 5) Tool contracts + guardrails (Weeks 5–9) - For every tool: JSON schema, validation, error codes, retries, idempotency. - Add preview/dry-run mode for write actions. - Add approvals for high-risk actions; log approver, timestamp, and trace ID. - Implement post-conditions (verify the write changed the intended record). 6) Verification + refusal behavior (Weeks 6–10) - Citation verification: ensure cited text actually supports the claim. - Add “refuse to answer” rules when evidence is missing or conflicting. - Add clarifying-question behavior for ambiguous requests. 7) Evals + CI (Weeks 7–12) - Set thresholds for: groundedness (% valid citations), tool success rate, policy violation rate, p95 latency, cost per resolved task. - Run evals automatically on every prompt/index/tool change. - Add regression alerts and a rollback mechanism (model routing fallback; disable risky tools). 8) Launch + monitoring (Weeks 10–13) - Start with a limited cohort (internal users or 5–10% of traffic). - Monitor daily: escalation rate, deflection, incident count, and cost per success. - Create an incident runbook: how to replay traces, isolate bad sources, and roll back changes. Definition of done - You can replay any agent run end-to-end (retrieval, model calls, tool calls). - You can answer procurement questions: retention, deletion, training use, and audit logs. - The workflow KPI improves meaningfully without increasing high-severity incidents.