AGENTIC AI PRODUCTION READINESS CHECKLIST (2026)

1) SCOPE & SUCCESS METRICS
- Define one narrow job-to-be-done (e.g., “close low-risk billing tickets” or “enrich inbound leads”).
- Specify a single primary KPI (completion rate, % automated, $ cost-to-complete, or cycle-time reduction).
- Set a minimum “safe failure” outcome: when the agent can’t finish, it must generate a structured handoff package.

2) WORKFLOW DESIGN
- Map the workflow as a state machine or DAG (inputs → plan → tool calls → verification → side effects → audit).
- Identify irreversible actions (refunds, deletes, deploys) and add approval gates.
- Add hard budgets: max steps, max tool calls, max tokens, and a wall-clock timeout.

3) TOOLS & CONTRACTS
- Implement typed tool interfaces (JSON Schema / function calling) with strict validation.
- Use allowlists for actions; deny by default.
- Make tool calls idempotent where possible; add correlation IDs and retries with backoff.

4) PERMISSIONS & SECURITY
- Use scoped, short-lived credentials for tools (minutes, not days).
- Map agent permissions to RBAC roles aligned with existing IdP (Okta/Entra).
- Add PII handling rules (redaction in logs; field-level access controls).

5) VERIFICATION & GUARDRAILS
- Add a verifier stage for any data mutation (deterministic checks + optional small-model judge).
- Require citations for decisions that depend on retrieved context.
- Enforce policy constraints (e.g., refund cap $50, SLA promises prohibited, restricted keywords).

6) EVALUATION & TESTING
- Build a regression set of 200–500 real historical cases; label pass/fail criteria.
- Track failure modes: tool mismatch, stale context, permission breach attempts, “silent wrong.”
- Re-run the suite on every prompt/model/config change; block deploys if KPI drops.

7) OBSERVABILITY & AUDIT
- Log: request, plan, retrieval references, each tool call (inputs/outputs), verifier result, final action.
- Measure P50/P95 latency and P95 cost-to-complete.
- Retain traces for at least 30 days (longer if regulated) and support export for audits.

8) ROLLOUT & OPERATIONS
- Start with canary rollout (1–5%), human approval mode, and clear rollback triggers.
- Define SLOs (example targets): containment ≥ 99.9%; completion ≥ 85% (narrow tasks); P95 cost ≤ $0.20; P95 latency ≤ 60s (async).
- Create an incident runbook: disable mutations, revoke credentials, and switch to summary-only mode.

GO-LIVE GATES (RECOMMENDED)
- 30 days of traces with no high-severity containment incidents.
- Regression suite stable across 3 consecutive releases.
- Security review completed (RBAC + scoped delegation + log retention + data isolation).
- Business owner sign-off on KPI, failure handling, and escalation path.

If you can’t answer “what changed, who authorized it, and why” in under 60 seconds, you’re not ready to ship an agent that takes action.