ICMD Agent Production Readiness Kit (APR-Kit) Use this checklist to ship ONE agent workflow into production safely. Copy into Notion/Jira and assign owners. 1) Workflow Manifest (Owner: Product) - Name the workflow (e.g., “refund_request_v2”). - Define input sources (tickets, emails, forms) and required fields. - Define “success” and “failure” outcomes in plain English. - List explicit out-of-scope cases (edge cases you will escalate). 2) Permission Matrix (Owner: Eng + Security) - Choose identity model: delegated service identity vs user impersonation. - Enumerate tools the agent can access (read vs write permissions). - Add step-up approvals for high-risk actions (money movement, customer comms). - Produce a one-page permission matrix (tool → scopes → risk level). 3) Data Governance (Owner: Security + Legal) - Document what data is sent to model providers and what is stored. - Set retention limits for prompts, traces, and artifacts (e.g., 30/90 days). - Define PII redaction rules before model calls. - List subprocessors and regions (US/EU) and link to DPA. 4) Golden Task Set (Owner: Agent Ops) - Collect 200+ representative tasks; remove PII. - Define expected outcomes (labels, actions, or structured results). - Tag tasks by difficulty tier (T1/T2/T3) and risk. 5) Regression Gate (Owner: Platform Eng) - Add CI step that runs evals on every prompt/tool/policy change. - Set pass thresholds (e.g., success rate ≥ 92% on T1, ≥ 80% on T2). - Require explicit approval to ship if thresholds fail. 6) Observability & Tracing (Owner: Platform Eng) - Log every tool call with args, result, latency, and errors. - Store structured traces (task_id, workflow, plan, guardrails, outcome). - Build dashboards: success rate, escalation rate, retries, cost per success. 7) Escalation & Fallback (Owner: Product + CS) - Define when to escalate (low confidence, tool failure, policy conflict). - Define fallback behavior (draft-only, read-only mode, or human takeover). - Write an on-call runbook and a “how to disable automation” switch. 8) Cost Controls (Owner: Eng + Finance) - Set budgets: max model calls per task, max retries, max context size. - Track cost per successful task weekly. - Define target gross margin and alert when margin drops. 9) Pilot-to-Production Rollout Plan (Owner: CS + Product) - Phase 1: read-only drafts (1–2 weeks). - Phase 2: propose actions with approvals (2–4 weeks). - Phase 3: limited autonomy under constraints (4–8 weeks). - Agree on success criteria for each phase with the customer. 10) Monthly Reliability Review (Owner: Agent Ops) - Publish a scorecard: success %, containment %, escalations, incidents. - Run postmortems on the top 3 failure modes. - Version prompts/policies and document changes. Definition of Done: You can show (a) a workflow manifest, (b) a permission matrix, (c) an eval report, (d) a trace sample, and (e) an incident runbook to a security-conscious buyer in under 30 minutes.