Agentic RAG 2.0 Launch Checklist (90-Day Plan)

Goal: Ship one production workflow (not a generic chatbot) that is grounded, measurable, and safe. Use this as a 90-day plan.

1) Pick the workflow + KPI (Week 1)
- Choose ONE workflow with a clear owner (Support, SOC, Sales Ops, Finance Ops).
- Define a single primary KPI: e.g., reduce handle time 15%, increase first-contact resolution 10%, cut onboarding time from 10 to 7 days.
- Define “stop conditions”: what actions the agent must never take without approval (money movement, permission changes, outbound email).

2) Build the gold dataset (Weeks 1–3)
- Collect 200–1,000 real historical cases.
- For each case, store: user question, correct outcome, required citations (doc URLs/IDs), required tool actions (if any), and “unacceptable outputs.”
- Redact PII/secrets and document your redaction approach.

3) Retrieval architecture (Weeks 2–5)
- List corpora: policies, product docs, runbooks, tickets, knowledge base, CRM notes.
- For each source, define freshness SLA (e.g., 15 minutes for pricing pages; daily for handbook).
- Implement hybrid retrieval where identifiers matter (keyword + vectors + reranker).
- Add metadata filters (tenant, product, region, version) to prevent cross-customer leakage.

4) Routing + memory (Weeks 4–7)
- Add an intent classifier to choose retrieval/tools per request.
- Implement scoped memory:
  - Session state: ephemeral, expires quickly.
  - Long-term profile: structured fields, user-consented, deletable.
  - Avoid storing policy text as memory; keep it in retrieval.

5) Tool contracts + guardrails (Weeks 5–9)
- For every tool: JSON schema, validation, error codes, retries, idempotency.
- Add preview/dry-run mode for write actions.
- Add approvals for high-risk actions; log approver, timestamp, and trace ID.
- Implement post-conditions (verify the write changed the intended record).

6) Verification + refusal behavior (Weeks 6–10)
- Citation verification: ensure cited text actually supports the claim.
- Add “refuse to answer” rules when evidence is missing or conflicting.
- Add clarifying-question behavior for ambiguous requests.

7) Evals + CI (Weeks 7–12)
- Set thresholds for: groundedness (% valid citations), tool success rate, policy violation rate, p95 latency, cost per resolved task.
- Run evals automatically on every prompt/index/tool change.
- Add regression alerts and a rollback mechanism (model routing fallback; disable risky tools).

8) Launch + monitoring (Weeks 10–13)
- Start with a limited cohort (internal users or 5–10% of traffic).
- Monitor daily: escalation rate, deflection, incident count, and cost per success.
- Create an incident runbook: how to replay traces, isolate bad sources, and roll back changes.

Definition of done
- You can replay any agent run end-to-end (retrieval, model calls, tool calls).
- You can answer procurement questions: retention, deletion, training use, and audit logs.
- The workflow KPI improves meaningfully without increasing high-severity incidents.