ICMD AI Teammate Launch Checklist (90-Day Plan)

Use this checklist to ship an accountable AI teammate (agentic workflow) that can scale in production.

1) Define the workflow (Week 1)
- Name one job-to-be-done with a clear start/end (e.g., “triage support ticket” not “support customers”).
- List 5–10 common variants and 5 edge cases.
- Define success metrics: success rate (%), escalation rate (%), and quality metric (CSAT, approval rate, defect rate).
- Set “blast radius” rules: what the AI is allowed to read, write, and never touch.

2) Set budgets and constraints (Week 1–2)
- Hard cap cost per run (e.g., $0.10–$1.00 depending on workflow value).
- Hard cap tool calls per run and max runtime (sync vs async).
- Define irreversible actions and require human approval for them.
- Decide your default behavior when low confidence: abstain + escalate.

3) Data grounding + permissions (Week 3–4)
- Connect only authoritative sources first (policy docs, KB, runbooks).
- Implement least privilege via SSO (Okta/Entra/Google) and role mapping.
- Add citations: every answer should link to exact source passages.
- Add freshness controls: index rebuild schedule + doc owners + staleness alerts.

4) Evaluation plan (Week 5–7)
- Build a golden set of 200–1,000 real examples with expected outcomes.
- Define regression gates: no deploy if safety violations rise, or success drops beyond threshold.
- Track at least: citation coverage, factuality/groundedness, and policy compliance.
- Create a weekly failure review: cluster errors into categories and fix systematically.

5) Guardrails (Week 5–7)
- PII detection + redaction (inputs and outputs).
- Prompt injection defenses for untrusted text (email bodies, web pages).
- Policy checks before side effects: schema validation, allow/deny lists, role checks.
- Safe degradation modes: connector down → read-only or cached snapshot.

6) Accountability UI + admin console (Week 8–10)
- Run history with run_id, inputs, outputs, citations, tool calls, cost, latency.
- “Why this” explanations and confidence display.
- Approve/deny workflows for high-risk actions.
- Exportable audit log (for SOC 2 / incident response).

7) Rollout and operations (Week 11–13)
- Start with 5–10% traffic or one team.
- Monitor cost per outcome (not just tokens), latency, and escalation.
- Add incident playbook: how to disable a workflow, rotate keys, roll back indexes.
- Create ownership: who maintains connectors, evals, and policy rules.

Exit criteria to scale beyond pilot
- 100% of runs logged with export.
- Stable unit economics: bounded cost per outcome.
- Measurable quality: success rate target met with acceptable CSAT/defect impact.
- Clear escalation path that users trust and actually use.
- Security sign-off: permissions, data handling, and auditability documented.