ICMD Agent Layer Launch Kit (30-Day Checklist + Policy Templates)

Use this to ship one production agent safely in 30 days.

1) Pick the right workflow (Day 1–3)
- Choose a single owner (Eng Platform, Support Ops, RevOps).
- Volume: target >= 500 runs/month or >= 2 hours/week of repetitive work.
- Systems touched: list every integration (GitHub, Jira, Zendesk, Salesforce, Stripe, NetSuite).
- Define “done” in one sentence (binary or scoreable). Example: “Ticket is categorized, priority set, and routed to correct queue.”
- Define top 5 failure modes (e.g., wrong refund, wrong customer, wrong repo, policy violation, brand tone).

2) Define access + identity (Day 4–7)
- Create a dedicated agent identity (service account) in your IAM.
- Enforce least privilege: split READ vs WRITE identities if needed.
- Inventory secrets/tokens; store in a secrets manager; forbid prompts from containing raw secrets.
- Set approval thresholds (examples):
  * Refunds > $100 require human approval
  * Production deploys require CI + reviewer
  * Customer-facing emails require template + review until quality proven

3) Build the “safe agent contract” (Day 8–14)
- Require all actions to be typed tool calls (no free-form actions).
- Implement policy gates:
  * allow/deny by tool name, target system, and resource
  * per-task budget (max steps, max time, max $)
  * rate limits per identity
- Add deterministic validators (amount <= last_payment, email domain checks, required fields).
- Add sandbox/staging mode for each write tool.

4) Logging + auditability (Day 15–18)
- Log every tool call: timestamp, tool name, parameters, result, latency.
- Record model metadata: provider, model name, version, routing choice.
- Store evidence pointers: retrieved doc IDs/hashes, ticket IDs, PR IDs.
- Redaction rules: remove PII/PHI/PCI from stored prompts; store hashes where needed.

5) Evaluation harness (Day 19–23)
- Assemble 200–500 historical cases.
- Define metrics:
  * success rate (meets “done” criteria)
  * human override rate (and reason codes)
  * policy violation rate
  * cost per run ($)
  * time saved (minutes/run)
- Create a regression suite: the same cases run nightly to detect drift.

6) Rollout plan (Day 24–30)
- Ship to 10–20% traffic with human-in-the-loop.
- Daily dashboard review: volume, errors, spend, top override reasons.
- Stop-the-line criteria: >1% policy violations or spend anomaly >2x baseline.
- Expand to 50%, then 100% only after 7 consecutive days meeting targets.

Policy templates (copy/paste)
- Budget policy: “This agent may spend up to $X/day and max Y tool calls/run; if exceeded, it must escalate to a human.”
- Data policy: “The agent may use only these fields in prompts: [list]. All other fields must be redacted.”
- Action policy: “The agent may create drafts freely; it may execute writes only in sandbox unless approved or within thresholds.”

Board-ready ROI summary (one paragraph)
- Baseline: cycle time, cost, and error rate before agent.
- After: success rate, time saved per run, cost per run, incident count.
- Net: monthly hours saved, $ saved, and payback period (weeks).