ICMD AI Coworker Leadership Operating System — 90-Day Rollout Checklist Goal: Introduce AI agents/copilots into core workflows without triggering a quality, security, or culture crisis. Use this as a leadership checklist for engineering, product, ops, and support. PHASE 0 (Days 1–7): Define the rules of the game 1) Name an executive sponsor (VP Eng/COO/Head of Support) and a single program owner. 2) Pick 2 pilot lanes only (one technical, one customer-facing). Examples: - Engineering: test generation + low-risk refactors - Support: ticket triage + draft replies (human approval) 3) Write a one-page AI policy: what’s allowed, what’s prohibited, and what requires human sign-off (pricing, legal, security incidents, account changes). PHASE 1 (Days 8–30): Build accountability + auditability first 4) Create an “Agent Register”: - Agent name, purpose, owner (DRI), backup owner - Allowed actions (allowlist), forbidden areas, escalation path - Data sources allowed (KB, docs, CRM fields) and prohibited (PII beyond X) 5) Logging requirements: - Store prompts, tool calls, outputs, approvals, and deployments - Set retention (e.g., 365 days) and access controls 6) Budgeting: - Establish monthly AI spend cap per function - Define unit metrics: $/resolved ticket, $/merged PR, $/1k automated triage actions PHASE 2 (Days 31–60): Install quality gates + evaluation 7) Define acceptance tests per lane: - Engineering: tests pass, lint/static analysis, dependency scan, CODEOWNERS approval - Support: policy compliance, correct routing, safe language, mandatory “escalate to human” triggers 8) Build a golden dataset for evals (minimum 50–200 real examples). 9) Set an error budget and rollback condition: - Example: if policy violations exceed 0.5% weekly, reduce autonomy and require approvals. PHASE 3 (Days 61–90): Increase autonomy gradually 10) Autonomy ladder: - Draft only → open PR/ticket → request review → limited merge → limited runbook execution 11) Incident process: - Any agent-caused regression, policy violation, or customer escalation gets a postmortem. - Track root causes: missing constraint, missing eval case, missing permission boundary. 12) Performance + culture: - Add “quality and judgment” criteria to reviews (not just output volume). - Normalize disclosure: label work as “AI-assisted” where relevant. Ongoing (Monthly/Quarterly): Governance 13) Quarterly permission review for each agent (tighten scopes; remove unused actions). 14) Monthly metrics review: - Throughput, defect rate, MTTR, customer CSAT impact, spend vs. cap 15) Vendor risk review: - Portability of prompts/policies/eval sets; exit plan if pricing or terms change. Definition of Done (for a successful rollout) - Every agent has a human DRI, explicit permissions, audit logs, evals, and a spend cap. - Quality metrics hold or improve (defects, incidents, escalations). - Teams report higher velocity without reduced trust in reviews or customer outcomes.