ICMD Agent Layer Launch Kit (30-Day Checklist + Policy Templates) Use this to ship one production agent safely in 30 days. 1) Pick the right workflow (Day 1–3) - Choose a single owner (Eng Platform, Support Ops, RevOps). - Volume: target >= 500 runs/month or >= 2 hours/week of repetitive work. - Systems touched: list every integration (GitHub, Jira, Zendesk, Salesforce, Stripe, NetSuite). - Define “done” in one sentence (binary or scoreable). Example: “Ticket is categorized, priority set, and routed to correct queue.” - Define top 5 failure modes (e.g., wrong refund, wrong customer, wrong repo, policy violation, brand tone). 2) Define access + identity (Day 4–7) - Create a dedicated agent identity (service account) in your IAM. - Enforce least privilege: split READ vs WRITE identities if needed. - Inventory secrets/tokens; store in a secrets manager; forbid prompts from containing raw secrets. - Set approval thresholds (examples): * Refunds > $100 require human approval * Production deploys require CI + reviewer * Customer-facing emails require template + review until quality proven 3) Build the “safe agent contract” (Day 8–14) - Require all actions to be typed tool calls (no free-form actions). - Implement policy gates: * allow/deny by tool name, target system, and resource * per-task budget (max steps, max time, max $) * rate limits per identity - Add deterministic validators (amount <= last_payment, email domain checks, required fields). - Add sandbox/staging mode for each write tool. 4) Logging + auditability (Day 15–18) - Log every tool call: timestamp, tool name, parameters, result, latency. - Record model metadata: provider, model name, version, routing choice. - Store evidence pointers: retrieved doc IDs/hashes, ticket IDs, PR IDs. - Redaction rules: remove PII/PHI/PCI from stored prompts; store hashes where needed. 5) Evaluation harness (Day 19–23) - Assemble 200–500 historical cases. - Define metrics: * success rate (meets “done” criteria) * human override rate (and reason codes) * policy violation rate * cost per run ($) * time saved (minutes/run) - Create a regression suite: the same cases run nightly to detect drift. 6) Rollout plan (Day 24–30) - Ship to 10–20% traffic with human-in-the-loop. - Daily dashboard review: volume, errors, spend, top override reasons. - Stop-the-line criteria: >1% policy violations or spend anomaly >2x baseline. - Expand to 50%, then 100% only after 7 consecutive days meeting targets. Policy templates (copy/paste) - Budget policy: “This agent may spend up to $X/day and max Y tool calls/run; if exceeded, it must escalate to a human.” - Data policy: “The agent may use only these fields in prompts: [list]. All other fields must be redacted.” - Action policy: “The agent may create drafts freely; it may execute writes only in sandbox unless approved or within thresholds.” Board-ready ROI summary (one paragraph) - Baseline: cycle time, cost, and error rate before agent. - After: success rate, time saved per run, cost per run, incident count. - Net: monthly hours saved, $ saved, and payback period (weeks).