ICMD Agent Readiness Scorecard (2026)

Use this scorecard to move from “cool demo” to “production agent” with predictable ROI.

A) Workflow Fit (0–10)
1) Clear Done State: Can you define an unambiguous completion condition (e.g., ticket closed with correct tag; refund issued with policy justification)? If not, stop.
2) Repeatability: Do you have at least 500 similar tasks/month? If <100/month, ROI is harder unless the task is high-value (security, legal).
3) Toolability: Are the required actions available via APIs (Jira, Salesforce, Stripe, Workday, ServiceNow)? If the workflow depends on manual UI clicks, budget time for RPA or integration work.
4) Error Tolerance: What’s the cost of a wrong action? If a single error can cost >$5,000 or create regulatory exposure, require stricter verification and approvals.

B) Data & Context Readiness (0–10)
5) Source of Truth: List the systems of record. If more than 3 systems must be reconciled, plan a data consolidation step.
6) Retrieval Quality: Can you retrieve the right policy/docs reliably? Run 50 queries and measure: % correct doc in top-5 results. Target 85%+ before scaling.
7) Data Hygiene: Identify PII/PHI/PCI fields. Define redaction rules and minimum context. “Retrieve less” is a security and cost win.

C) Safety, Governance, and Controls (0–10)
8) Identity Model: Create an agent identity per workflow. No shared keys. Use least privilege per tool.
9) Tool Gateway: Put tools behind a gateway that centralizes secrets, enforces policy, and logs every call.
10) Approvals & Thresholds: Define human approval rules (e.g., refunds >$500, customer-impacting changes, P0 incidents).
11) Auditability: Ensure replayable traces: prompt version, retrieved sources, tool calls, outputs, and final actions.

D) Reliability & Cost Management (0–10)
12) Golden Set: Build 200 real examples with expected outcomes and edge cases.
13) Verifiers: Add schema validation, deterministic checks, and (for risky tasks) a second-pass verifier.
14) SLOs: Define SLOs like: 95% complete <90s; escalation rate <5%; tool call error rate <1%.
15) Cost per Completion: Track total cost per verified completion (model + tools + human review). Set a budget (e.g., <$1.50/task) and alert on anomalies.

Scoring Guidance
- 32–40: Ready for production write-access (with canary rollout).
- 24–31: Pilot in sandbox or read-only mode; close gaps in retrieval, verification, or permissions.
- <24: Not ready; redesign workflow or choose a narrower bounded scope.

Rollout Template
Week 1: Define contract (schemas, tools, permissions), build golden set.
Week 2: Implement tool gateway + logging + verifiers; run offline eval.
Week 3: Shadow mode in production (no writes); measure cost and error modes.
Week 4: Canary write-access (5–10% of tasks) + weekly regression + incident runbook.

If you can’t measure cost per verified completion and escalation rate, you don’t have an agent—you have a prototype.