Agentic AI Production Readiness Checklist (2026)

Use this checklist before moving any agent from pilot to production. It’s designed for founders, platform teams, and operators who need predictable cost, reliable behavior, and auditability.

1) Define the unit of work (UoW)
- Write the UoW in one line (e.g., “resolve a Tier-1 support ticket”, “process one invoice”, “close one access request”).
- Define success and failure in measurable terms (resolution time, escalation rate, policy violations).
- Decide the maximum acceptable cost per UoW (hard budget) and the p95 latency target.

2) Choose an autonomy level (L0–L4)
- L0 Suggest: no tool access.
- L1 Recommend: tool calls proposed, human approves.
- L2 Execute reversible actions: allowlist + rollback.
- L3 Execute bounded actions: policy limits + confidence gates + sampled review.
- L4 High autonomy: segregation of duties + incident response + kill switch.
- Document what must never be automated (e.g., payouts over $X, deletions, external legal commitments).

3) Instrument cost and behavior
- Log per-task: tokens in/out, model used, retrieval hits, tool calls, retries, and wall-clock time.
- Create dashboards: cost per UoW, p95 latency, tool-call counts, retry rate, and violation rate.
- Add runtime budgets: max tokens per task, max tool calls per task, max retries per tool.

4) Build evaluation coverage
- Maintain a “golden set” of 50–200 real cases with expected outcomes.
- Add adversarial cases: poisoned docs, conflicting policies, malformed tool outputs, and prompt-injection attempts.
- Define release gates: no deploy if golden-set quality drops by more than an agreed threshold (e.g., 2–5%).
- Run shadow mode for 2–4 weeks for new workflows; compare against human outcomes.

5) Guardrails and policy enforcement
- Implement allowlisted tools and least-privilege scopes (read vs write separated).
- Require human approval for irreversible or high-dollar actions.
- Add confidence gates (route to human when confidence < threshold).
- Use citation requirements for any customer-facing or policy-sensitive output.

6) Security and compliance essentials
- Use short-lived credentials; avoid long-lived API keys embedded in prompts.
- Redact secrets in logs; define what prompt/context can be stored.
- Produce an audit trail: prompt, retrieved doc references, tool inputs/outputs, final action + rationale.
- Establish an incident runbook: detection, containment (kill switch), remediation, and postmortem.

7) Operational readiness
- Define SLOs and error budgets for each workflow.
- Implement circuit breakers (stop executing tools when anomalies spike).
- Create a rollback/reconciliation process to repair partial executions.
- Assign clear ownership: who is on call for agent incidents?

Exit criteria to go live
- Meets quality targets on golden set and in shadow mode.
- Cost per UoW is within budget at p95.
- Audit logs and policy gates verified.
- Kill switch tested end-to-end.
- On-call and incident response process documented and rehearsed.