PRODUCTION AI AGENT READINESS CHECKLIST (2026)

Use this checklist to move an AI agent from prototype to a production system you can operate, audit, and scale.

1) SCOPE & OUTCOMES
- Define the primary outcome (e.g., “resolve Tier-1 tickets”) and 2–3 secondary metrics (CSAT, escalation rate, time-to-first-response).
- Classify risk level per action: Low (draft/recommend), Medium (write internal state), High (money movement, permissions, deletes).
- Set explicit success targets before rollout (example: ≥80% draft-approval rate; ≤5% policy violations; cost ≤$0.50/task).

2) IDENTITY & PERMISSIONS
- Create a dedicated service identity for each agent capability (not a shared bot user).
- Implement least-privilege roles across every connected system (Zendesk/Jira/Salesforce/GitHub/Stripe/etc.).
- Require time-bound, scope-bound delegated access for sensitive actions (capability tokens that expire).
- Confirm every action is attributable: run_id, agent identity, tool name, timestamp, and target object (ticket_id, invoice_id).

3) TOOL DESIGN (WRITE TOOLS MUST BE SAFE)
- Prefer narrow primitives over generic “updateRecord(payload)” tools.
- Validate all tool inputs server-side (schema + business rules + contextual checks).
- Require idempotency keys for any tool that causes side effects.
- Add rate limits and per-run tool-call budgets (example: max 20 tool calls/run; max 3 retries/step).

4) POLICY & GUARDRAILS
- Define approval tiers with clear thresholds (example: refunds auto-approve <$50; human approval $50–$500; 2-person >$500).
- Add kill switches: disable specific tools or all writes within minutes.
- Use allow-lists for domains, recipients, and destinations (especially for outbound email and web actions).
- Enforce “no secret exfiltration”: redact tokens/PII in logs; block tools from returning raw secrets.

5) OBSERVABILITY & REPLAY
- Persist a trace per run: inputs, retrieved docs, prompt versions, tool calls/returns, validations, policy decisions, outcome.
- Store a replay capsule (prompt template version, tool version, policy version, retrieval snapshot identifiers).
- Monitor weekly: success rate, escalation rate, approval/denial reasons, median and p95 latency, cost per successful outcome.

6) ROLLOUT & OPERATIONS
- Roll out in phases: read-only → draft mode → narrow writes → expanded writes → broader coverage.
- Use canary percentages (1% → 5% → 20% → 50% → 100%) with stop conditions (CSAT drop, spike in escalations, anomaly in refunds).
- Write runbooks for top failure modes (loops, duplicate writes, permission errors, overly cautious behavior).
- Assign an Agent Owner responsible for metrics, incidents, and weekly review.

7) SECURITY & COMPLIANCE (MINIMUM BAR)
- Ensure SOC 2-relevant controls: access reviews, audit logs, change management for prompts/tools/policies.
- Document data flow: what data the agent can read, where it’s stored, retention period, and who can access traces.
- Perform threat modeling for prompt injection and tool misuse; test with adversarial examples.

8) GO/NO-GO GATE
- Go to production only if: (a) high-risk actions are gated by approvals, (b) writes are idempotent and validated, (c) traces are replayable, (d) kill switches exist, and (e) cost per outcome meets your ceiling.

If you can’t explain—concisely—what your agent can do, what it cannot do, and how you would stop it in 60 seconds, it’s not production-ready.