ICMD Agent Readiness Checklist (2026) Use this checklist to take an agent from idea to production without surprise costs, security gaps, or untestable behavior. Copy/paste into your ticket as acceptance criteria. 1) Scope the Unit of Work (UoW) - Define a single job-to-be-done in one page: trigger, inputs, outputs, and “done” condition. - List the top 10 edge cases (missing data, conflicting records, rate limits, ambiguous user intent). - Identify the worst-case failure and ensure it’s reversible (draft vs send; recommend vs execute). 2) Tooling & Data Contracts - Every tool has a typed interface (JSON schema / function signature) and returns machine-readable errors. - Actions are idempotent (safe to retry) and have request IDs. - Retrieval sources are explicit (which KBs, which repos, which DB tables) with access controls. 3) Identity, Permissions, and Secrets - No shared API keys: use per-run short-lived tokens (minutes TTL) or delegated OAuth. - Enforce least privilege: allowlist endpoints + parameter constraints (e.g., refund <= $50). - Implement an emergency kill switch and automated credential revocation. 4) Budgets and Stop Conditions - Set budgets: token budget, tool-call budget, and wall-clock timeout per run. - Define stop conditions in code: goal met, budget exceeded, policy violation, low confidence. - Add degrade modes: switch to summarization or human escalation at 70–90% budget use. 5) Observability & Auditability - Log every run with: prompt/tool versions, model version, tool inputs/outputs, latency, token counts. - Ensure replayability for incident response (target: >99% of runs replayable). - Redact or tokenize sensitive fields in logs (PII, secrets, regulated data). 6) Evaluation (Evals) - Build a golden set (50–300 real tasks) with expected outcomes. - Add adversarial tests: prompt injection attempts, malformed tool outputs, missing permissions. - Run evals on every deployment; fail the release on regression thresholds. 7) Human-in-the-Loop (HITL) Gates - Start in “draft mode” where agent proposes actions for approval. - Define escalation rules (low confidence, missing data, high-risk action, budget exceeded). - Measure escalation rate and approval latency; iterate until stable. 8) Rollout Plan - Canary to internal users first (e.g., 5–10%), then expand by risk tier. - Monitor: success rate, cost per outcome, policy violations, and customer impact metrics. - Document rollback: model downgrade, tool disable, feature flag off, revert prompt version. Success definition (fill in): - KPI: ______________________ (e.g., cost per resolved ticket, mean time to mitigate) - Target success rate: _______% - Target cost per outcome: $________ - Max acceptable policy violations per week: ________ - Max time-to-resolution: ________