ICMD Agent Governance Checklist (2026)

Use this checklist to operationalize “AI-first leadership” without creating a bureaucracy. Target: complete an initial pass in 2–4 weeks.

1) Inventory (Week 1)
- List every agent/automation that can take actions (not just draft text).
- For each: name, version, owner, vendor/model (e.g., OpenAI/Anthropic/Gemini or self-hosted), and where it runs (Slack, web app, CI, CRM).
- Document systems it can read from and write to (GitHub, Stripe, Salesforce, Zendesk, prod DB).
- Identify any shadow agents: personal accounts, browser plugins, unofficial Zapier/Make workflows.

2) Assign Accountability (Week 1)
- Assign one human “DRI” (directly responsible individual) per agent.
- Assign a security partner for high/very-high risk agents.
- Define on-call/escalation for agent-caused incidents (who gets paged, within what SLA).

3) Risk Classify (Week 2)
- Classify each workflow: Low / Medium / High / Very High.
- High/Very High triggers: money movement, customer commitments, legal/compliance, production changes, identity/permissions.
- For each, define what “failure” means (e.g., wrong refund, wrong email, broken deploy, leaked data).

4) Guardrails (Week 2–3)
- Enforce least privilege: read-only by default; narrow write scopes; time-bound tokens.
- Add thresholds (e.g., refunds under $50 autonomous; $50–$200 approval; >$200 finance review).
- Require dry-run or staging for production actions; use feature flags and rollback plans.
- Add human approval gates for irreversible actions (customer emails, merges to main, pricing changes).

5) Logging & Auditability (Week 3)
- Log every agent action with: timestamp, agent/version, policy version, inputs/outputs, systems touched, trace ID, approval status.
- Store logs centrally (SIEM/log pipeline) with defined retention (e.g., 90–180 days, longer for regulated workflows).
- Make logs searchable by customer ID, order ID, pull request, or incident.

6) Metrics (Week 3–4)
- Track outcome metrics per workflow (CSAT, MTTR, change failure rate, forecast error).
- Track reliability metrics: override rate, escalations per 1,000 actions, rollback frequency.
- Set alert thresholds (e.g., override rate > 20% week-over-week; escalations spike 2x).

7) Evaluation & Change Control (Ongoing)
- Maintain a small eval set per agent (20–200 representative cases).
- Require eval pass before increasing autonomy or shipping new policy versions.
- Version policies like code; publish changelogs; run postmortems on agent-caused incidents.

8) Culture & Training (Ongoing)
- Train teams on “draft vs decide”: AI can propose; humans own outcomes.
- Make review a skill with standards and examples.
- Publicize a safe channel for reporting risky automations; reward early escalation.

Definition of Done: Every agent that can change state has (1) a human owner, (2) risk classification, (3) least-privilege permissions, (4) action logging, and (5) at least one outcome KPI reviewed monthly.