AGENTIC AI LAUNCH READINESS CHECKLIST (2026)

Use this checklist to move from a promising prototype to a production-grade agent that can safely execute workflows.

1) WORKFLOW DEFINITION (SCOPE)
- Define ONE workflow with a hard boundary (start event → end state).
- List systems touched (e.g., Zendesk, Salesforce, Stripe) and whether access is read vs write.
- Define success in system terms: fields updated, messages sent, tickets closed, refunds issued, etc.
- Identify the “blast radius” if wrong (how many users/records could be impacted per run).

2) AUTONOMY MODEL (GRADED AUTONOMY)
- Set autonomy levels:
  L0 Draft only (no tools)
  L1 Read-only tools (search, fetch status)
  L2 Safe writes (tags, drafts, PRs, non-financial updates)
  L3 High-impact writes (money movement, irreversible actions)
- For each tool/action, decide: allowed/blocked + approval required (Y/N).
- Ensure least-privilege defaults: read-only until proven.

3) PERMISSIONS & GOVERNANCE
- Implement role-based access (user/admin) and per-connector scopes.
- Add spend controls: budgets, alerts, throttles, and model tier restrictions per workflow.
- Data retention policy: logs and traces (e.g., 30/90 days) and deletion support.
- PII handling: redaction in logs; clear memory rules (what is stored, where, for how long).

4) PROOFS, AUDITABILITY, AND ROLLBACK
- Every run gets a trace ID and structured run record (prompt version, tool calls, outputs).
- Proof UI: show the evidence used (sources), actions taken, and a diff of record changes.
- Rollback coverage target: at least 90% of write actions reversible.
- Incident workflow: how to pause the agent, revert changes, and notify affected users.

5) EVALUATION & LAUNCH GATES
- Build an offline eval set from real cases (at least 200 examples per core workflow).
- Track four trust metrics:
  - Task success rate (end state correct)
  - Intervention rate (% runs needing human edits/retries)
  - Time-to-complete (median + P95)
  - Blast radius per failure (records/users impacted)
- Suggested launch thresholds:
  - Success rate ≥ 95% on top flows
  - Intervention rate ≤ 20% for Level-2 autonomy
  - P95 time-to-complete ≤ 60s for interactive workflows
  - 100% runs traceable via trace ID
- Run 2-week shadow mode before enabling any writes.

6) OPERATIONAL READINESS
- Version prompts, policies, and tool schemas in git; use staged rollouts (5%→25%→100%).
- Define an on-call owner, escalation path, and an incident template.
- Add connector monitoring for API errors, rate limits, and schema changes.
- Create a “kill switch” to disable writes instantly.

7) PRICING & PACKAGING
- Choose the economic unit: per ticket, per invoice, per lead, per 1,000 tasks.
- Bundle governance (SSO, audit logs, admin policies) into an enterprise tier.
- Avoid punishing efficiency: align price with value delivered, not raw action count.

FINAL GO/NO-GO QUESTIONS
- If the agent is wrong once, can we prove what happened and undo it?
- Can admins restrict autonomy by workflow, role, and connector?
- Do we have clear thresholds that determine when to unlock higher autonomy?
- Can customers predict and cap spend?

If you can answer “yes” to all four, you’re ready to ship an agent users will trust.