ICMD Agent Production Readiness Kit (2026)

Use this kit to launch one agent workflow safely and measurably in ~90 days.

1) Workflow Selection (Score 1–5 each)
- Repeatability: Is the task performed >500 times/month?
- Structure: Are inputs mostly structured (forms, tickets, CRM fields)?
- Action space: Can you constrain actions to 3–10 tool calls?
- Value: Is the fully-loaded human cost >$8 per task or does it reduce churn/revenue risk?
- Blast radius: Can mistakes be reversed easily (or routed to approval)?
Pick the top workflow with high repeatability + value and low blast radius.

2) Define Success Metrics (write these down before building)
- Task success rate (no human needed): target 70% by week 6, 85% by week 12.
- Escalation rate: target <20% by week 12 with clear reasons logged.
- p95 latency: target <10s interactive, <60s background.
- Safety: target 0 unauthorized writes; 0 PII leaks in logs.
- Unit economics: cost per completed task vs baseline human cost.

3) Identity & Permissions Template
- Create a dedicated agent service principal per workflow.
- Grant least privilege: separate read scope from write scope.
- Use short-lived tokens (minutes, not days); rotate secrets.
- Implement budgets: per-session write limit and per-day spend cap.
- Require approval for irreversible actions (refunds, payouts, account closures).

4) Tooling Guardrails (minimum viable)
- Tool allowlist (explicitly enumerate callable tools).
- JSON schema validation for tool inputs.
- Rate limiting per agent identity.
- Policy gate: allow/deny based on tool, action, args, and remaining budget.
- Safe failure: retries capped (e.g., 2), then escalate with context.

5) Evaluation Plan
- Golden task suite: start with 200 labeled examples; grow to 1,000+.
- Adversarial set: 30–100 prompt-injection and data-exfil attempts.
- Regression cadence: run on every prompt/template or tool change.
- Canary releases: ship to 5% traffic; auto-rollback on KPI regression.
- Drift monitoring: weekly sample review of 50 traces for new failure types.

6) Observability & Audit Requirements
- Log: request, prompt/template version, tool calls, outputs (redacted), policy decisions.
- Correlation ID per run; replayable trace for incidents.
- Dashboards: success rate, cost/run, latency, top escalation reasons, tool error rates.
- Incident runbook: disable switch, rollback plan, escalation contacts.

7) 90-Day Rollout Schedule
Weeks 1–2: baseline metrics, select workflow, define SLOs.
Weeks 3–4: tool gateway + identity + allowlist.
Weeks 5–6: copilot (human approval for writes) + trace capture.
Weeks 7–9: eval suite + canary gates + rollback automation.
Weeks 10–12: expand autonomy for low-risk actions; keep high-risk behind approval.

Copy/paste this kit into your internal doc, assign owners (Security, Platform Eng, Applied AI, Ops), and don’t raise autonomy until the evidence (evals + traces + metrics) says you can.