ICMD Agentic Feature Launch Pack (2026)

Use this framework to take an agent feature from prototype to production without blowing up trust, security, or unit economics.

1) Define the job (narrow scope)
- Name the workflow in 6 words or less (e.g., “Resolve shipping status tickets”).
- Define “done” as an outcome, not a response (e.g., “Customer receives correct status + ETA”).
- List exclusions (what the agent must refuse), including regulated data, high-dollar actions, and ambiguous requests.

2) Pick the autonomy rung (Trust Ladder)
Choose ONE as your initial launch target:
- Suggest: surfaces info/recommendations.
- Draft: produces a draft for a human to edit.
- Execute with review: performs actions only after explicit approval.
- Execute with audit: auto-executes low-risk actions; always logs traces.
- Policy-based autonomy: executes based on thresholds (amounts, customer tier, data sensitivity).

3) Tool contract checklist (before any “Execute”)
- Schemas: strict JSON schema for every tool call.
- Idempotency: repeat calls do not duplicate effects.
- Error taxonomy: consistent error codes (auth, not_found, conflict, rate_limited, validation).
- Rollback: define undo steps or compensating transactions.
- Actor identity: every tool call must record which user/service identity was used.

4) Instrumentation you must have
- Run trace per task: intent, plan steps, retrieval sources, tool calls, outputs.
- Budgets: max tool calls, max retries, max cost per run, max latency per run.
- Outcomes: success, safe_refusal, handoff, rollback, human_override.
- Unit economics: cost per successful task (not tokens) by task type.

5) Release gates (ship/no-ship)
- Tool correctness: ≥99.5% schema-valid calls in staging.
- Safe completion: ≥95% correct-or-refuse on your low-risk scenario suite.
- Latency: P95 within your UX budget (or provide async UI).
- Cost: P95 cost/run within target AND variance is stable week-over-week.
- Auditability: 100% action traces with timestamps + actor identity.

6) Rollout plan (2–4 weeks)
- Week 1: internal dogfood with traces; fix top 10 failure modes.
- Week 2: 1–5 design partners; “Execute with review” only.
- Week 3: expand to 10–20% traffic; enable auto-exec for a single low-risk category.
- Week 4: GA with clear admin controls, retention settings, and a rollback policy.

7) Operator playbook (ongoing)
- Daily: regression suite run; investigate any >1% drop in safe completion.
- Weekly: cost/task review; tune routing and caching.
- Monthly: expand autonomy to one new category only after passing gates.

If you can’t answer: “What did the agent change, why, and how do we undo it?”—you’re not ready to ship Execute.