ICMD AI Coworker Leadership Operating System — 90-Day Rollout Checklist

Goal: Introduce AI agents/copilots into core workflows without triggering a quality, security, or culture crisis. Use this as a leadership checklist for engineering, product, ops, and support.

PHASE 0 (Days 1–7): Define the rules of the game
1) Name an executive sponsor (VP Eng/COO/Head of Support) and a single program owner.
2) Pick 2 pilot lanes only (one technical, one customer-facing). Examples:
   - Engineering: test generation + low-risk refactors
   - Support: ticket triage + draft replies (human approval)
3) Write a one-page AI policy: what’s allowed, what’s prohibited, and what requires human sign-off (pricing, legal, security incidents, account changes).

PHASE 1 (Days 8–30): Build accountability + auditability first
4) Create an “Agent Register”:
   - Agent name, purpose, owner (DRI), backup owner
   - Allowed actions (allowlist), forbidden areas, escalation path
   - Data sources allowed (KB, docs, CRM fields) and prohibited (PII beyond X)
5) Logging requirements:
   - Store prompts, tool calls, outputs, approvals, and deployments
   - Set retention (e.g., 365 days) and access controls
6) Budgeting:
   - Establish monthly AI spend cap per function
   - Define unit metrics: $/resolved ticket, $/merged PR, $/1k automated triage actions

PHASE 2 (Days 31–60): Install quality gates + evaluation
7) Define acceptance tests per lane:
   - Engineering: tests pass, lint/static analysis, dependency scan, CODEOWNERS approval
   - Support: policy compliance, correct routing, safe language, mandatory “escalate to human” triggers
8) Build a golden dataset for evals (minimum 50–200 real examples).
9) Set an error budget and rollback condition:
   - Example: if policy violations exceed 0.5% weekly, reduce autonomy and require approvals.

PHASE 3 (Days 61–90): Increase autonomy gradually
10) Autonomy ladder:
   - Draft only → open PR/ticket → request review → limited merge → limited runbook execution
11) Incident process:
   - Any agent-caused regression, policy violation, or customer escalation gets a postmortem.
   - Track root causes: missing constraint, missing eval case, missing permission boundary.
12) Performance + culture:
   - Add “quality and judgment” criteria to reviews (not just output volume).
   - Normalize disclosure: label work as “AI-assisted” where relevant.

Ongoing (Monthly/Quarterly): Governance
13) Quarterly permission review for each agent (tighten scopes; remove unused actions).
14) Monthly metrics review:
   - Throughput, defect rate, MTTR, customer CSAT impact, spend vs. cap
15) Vendor risk review:
   - Portability of prompts/policies/eval sets; exit plan if pricing or terms change.

Definition of Done (for a successful rollout)
- Every agent has a human DRI, explicit permissions, audit logs, evals, and a spend cap.
- Quality metrics hold or improve (defects, incidents, escalations).
- Teams report higher velocity without reduced trust in reviews or customer outcomes.