ICMD Agent Production Readiness Kit (2026) Use this kit to launch one agent workflow safely and measurably in ~90 days. 1) Workflow Selection (Score 1–5 each) - Repeatability: Is the task performed >500 times/month? - Structure: Are inputs mostly structured (forms, tickets, CRM fields)? - Action space: Can you constrain actions to 3–10 tool calls? - Value: Is the fully-loaded human cost >$8 per task or does it reduce churn/revenue risk? - Blast radius: Can mistakes be reversed easily (or routed to approval)? Pick the top workflow with high repeatability + value and low blast radius. 2) Define Success Metrics (write these down before building) - Task success rate (no human needed): target 70% by week 6, 85% by week 12. - Escalation rate: target <20% by week 12 with clear reasons logged. - p95 latency: target <10s interactive, <60s background. - Safety: target 0 unauthorized writes; 0 PII leaks in logs. - Unit economics: cost per completed task vs baseline human cost. 3) Identity & Permissions Template - Create a dedicated agent service principal per workflow. - Grant least privilege: separate read scope from write scope. - Use short-lived tokens (minutes, not days); rotate secrets. - Implement budgets: per-session write limit and per-day spend cap. - Require approval for irreversible actions (refunds, payouts, account closures). 4) Tooling Guardrails (minimum viable) - Tool allowlist (explicitly enumerate callable tools). - JSON schema validation for tool inputs. - Rate limiting per agent identity. - Policy gate: allow/deny based on tool, action, args, and remaining budget. - Safe failure: retries capped (e.g., 2), then escalate with context. 5) Evaluation Plan - Golden task suite: start with 200 labeled examples; grow to 1,000+. - Adversarial set: 30–100 prompt-injection and data-exfil attempts. - Regression cadence: run on every prompt/template or tool change. - Canary releases: ship to 5% traffic; auto-rollback on KPI regression. - Drift monitoring: weekly sample review of 50 traces for new failure types. 6) Observability & Audit Requirements - Log: request, prompt/template version, tool calls, outputs (redacted), policy decisions. - Correlation ID per run; replayable trace for incidents. - Dashboards: success rate, cost/run, latency, top escalation reasons, tool error rates. - Incident runbook: disable switch, rollback plan, escalation contacts. 7) 90-Day Rollout Schedule Weeks 1–2: baseline metrics, select workflow, define SLOs. Weeks 3–4: tool gateway + identity + allowlist. Weeks 5–6: copilot (human approval for writes) + trace capture. Weeks 7–9: eval suite + canary gates + rollback automation. Weeks 10–12: expand autonomy for low-risk actions; keep high-risk behind approval. Copy/paste this kit into your internal doc, assign owners (Security, Platform Eng, Applied AI, Ops), and don’t raise autonomy until the evidence (evals + traces + metrics) says you can.