ICMD Agent Governance Checklist (2026) Use this checklist to operationalize “AI-first leadership” without creating a bureaucracy. Target: complete an initial pass in 2–4 weeks. 1) Inventory (Week 1) - List every agent/automation that can take actions (not just draft text). - For each: name, version, owner, vendor/model (e.g., OpenAI/Anthropic/Gemini or self-hosted), and where it runs (Slack, web app, CI, CRM). - Document systems it can read from and write to (GitHub, Stripe, Salesforce, Zendesk, prod DB). - Identify any shadow agents: personal accounts, browser plugins, unofficial Zapier/Make workflows. 2) Assign Accountability (Week 1) - Assign one human “DRI” (directly responsible individual) per agent. - Assign a security partner for high/very-high risk agents. - Define on-call/escalation for agent-caused incidents (who gets paged, within what SLA). 3) Risk Classify (Week 2) - Classify each workflow: Low / Medium / High / Very High. - High/Very High triggers: money movement, customer commitments, legal/compliance, production changes, identity/permissions. - For each, define what “failure” means (e.g., wrong refund, wrong email, broken deploy, leaked data). 4) Guardrails (Week 2–3) - Enforce least privilege: read-only by default; narrow write scopes; time-bound tokens. - Add thresholds (e.g., refunds under $50 autonomous; $50–$200 approval; >$200 finance review). - Require dry-run or staging for production actions; use feature flags and rollback plans. - Add human approval gates for irreversible actions (customer emails, merges to main, pricing changes). 5) Logging & Auditability (Week 3) - Log every agent action with: timestamp, agent/version, policy version, inputs/outputs, systems touched, trace ID, approval status. - Store logs centrally (SIEM/log pipeline) with defined retention (e.g., 90–180 days, longer for regulated workflows). - Make logs searchable by customer ID, order ID, pull request, or incident. 6) Metrics (Week 3–4) - Track outcome metrics per workflow (CSAT, MTTR, change failure rate, forecast error). - Track reliability metrics: override rate, escalations per 1,000 actions, rollback frequency. - Set alert thresholds (e.g., override rate > 20% week-over-week; escalations spike 2x). 7) Evaluation & Change Control (Ongoing) - Maintain a small eval set per agent (20–200 representative cases). - Require eval pass before increasing autonomy or shipping new policy versions. - Version policies like code; publish changelogs; run postmortems on agent-caused incidents. 8) Culture & Training (Ongoing) - Train teams on “draft vs decide”: AI can propose; humans own outcomes. - Make review a skill with standards and examples. - Publicize a safe channel for reporting risky automations; reward early escalation. Definition of Done: Every agent that can change state has (1) a human owner, (2) risk classification, (3) least-privilege permissions, (4) action logging, and (5) at least one outcome KPI reviewed monthly.