ICMD AI Teammate Launch Checklist (90-Day Plan) Use this checklist to ship an accountable AI teammate (agentic workflow) that can scale in production. 1) Define the workflow (Week 1) - Name one job-to-be-done with a clear start/end (e.g., “triage support ticket” not “support customers”). - List 5–10 common variants and 5 edge cases. - Define success metrics: success rate (%), escalation rate (%), and quality metric (CSAT, approval rate, defect rate). - Set “blast radius” rules: what the AI is allowed to read, write, and never touch. 2) Set budgets and constraints (Week 1–2) - Hard cap cost per run (e.g., $0.10–$1.00 depending on workflow value). - Hard cap tool calls per run and max runtime (sync vs async). - Define irreversible actions and require human approval for them. - Decide your default behavior when low confidence: abstain + escalate. 3) Data grounding + permissions (Week 3–4) - Connect only authoritative sources first (policy docs, KB, runbooks). - Implement least privilege via SSO (Okta/Entra/Google) and role mapping. - Add citations: every answer should link to exact source passages. - Add freshness controls: index rebuild schedule + doc owners + staleness alerts. 4) Evaluation plan (Week 5–7) - Build a golden set of 200–1,000 real examples with expected outcomes. - Define regression gates: no deploy if safety violations rise, or success drops beyond threshold. - Track at least: citation coverage, factuality/groundedness, and policy compliance. - Create a weekly failure review: cluster errors into categories and fix systematically. 5) Guardrails (Week 5–7) - PII detection + redaction (inputs and outputs). - Prompt injection defenses for untrusted text (email bodies, web pages). - Policy checks before side effects: schema validation, allow/deny lists, role checks. - Safe degradation modes: connector down → read-only or cached snapshot. 6) Accountability UI + admin console (Week 8–10) - Run history with run_id, inputs, outputs, citations, tool calls, cost, latency. - “Why this” explanations and confidence display. - Approve/deny workflows for high-risk actions. - Exportable audit log (for SOC 2 / incident response). 7) Rollout and operations (Week 11–13) - Start with 5–10% traffic or one team. - Monitor cost per outcome (not just tokens), latency, and escalation. - Add incident playbook: how to disable a workflow, rotate keys, roll back indexes. - Create ownership: who maintains connectors, evals, and policy rules. Exit criteria to scale beyond pilot - 100% of runs logged with export. - Stable unit economics: bounded cost per outcome. - Measurable quality: success rate target met with acceptable CSAT/defect impact. - Clear escalation path that users trust and actually use. - Security sign-off: permissions, data handling, and auditability documented.