Agent Production Readiness Checklist (2026) Use this checklist to move from a working demo to a production agent that is reliable, cost-contained, and auditable. 1) Define the workflow boundary - Name the single workflow (e.g., “triage support tickets,” “enrich inbound leads”). - List allowed systems (Zendesk, Jira, Salesforce, internal DB). - Write a definition of “success” (what must be true at the end of a run). 2) Write an authority spec (permissions) - Separate read-only tools from write tools. - Identify “high-risk” actions (refunds, cancellations, deletions, PII exposure). - Specify approval rules (human review required, dual control, or auto-approved). 3) Build a tool gateway (don’t call SaaS APIs directly) - Implement allowlists per tool + method. - Enforce tenant isolation and least privilege. - Use short-lived scoped tokens; never place secrets in prompts. 4) Enforce budgets in code - Step budget: max tool calls per run (start with 6–10). - Spend budget: max $/run, max tokens/run. - Rate limits: per tool, per tenant; include backoff and retries. 5) Implement trace logging and retention - Log: user request, system prompt version, tool calls, tool responses, policy decisions, final output. - Store traces immutably with a retention policy (commonly 30–180 days depending on compliance). 6) Create an evaluation set from real data - Collect 50–200 historical cases for the workflow. - Label outcomes: success, partial success, failure; note risk tier. - Add adversarial and edge cases (timeouts, missing fields, ambiguous requests). 7) Choose metrics that map to operations - Task success rate (segmented by risk tier). - Cost per successful task (include retries and tool calls). - Latency (median and P95). - Escalation rate and “reason for escalation” taxonomy. 8) Ship in phases - Phase A: shadow mode (agent runs, humans act) for 1–2 weeks. - Phase B: human-in-the-loop (agent drafts actions; humans approve). - Phase C: limited autonomy (only low-risk actions) with tight monitoring. 9) Add safety rails for write actions - Require structured outputs and schema validation. - Use confirmation prompts for irreversible actions. - Add “undo” capability where possible (revert field changes, cancel queued actions). 10) Operationalize ownership - Assign an on-call owner for incidents. - Create a rollback plan: model revert, prompt revert, tool disable switch. - Review traces weekly; update eval set monthly. If you can’t answer “what can this agent do, how much can it spend, and who approved its actions?” you’re not production-ready.