Agent Production Readiness Checklist (2026) Use this checklist to graduate an agent from demo to production. It’s written for founders, tech leads, and operators. 1) Define the workflow boundary (scope) - Name the workflow (e.g., “Refund handling,” “RMA creation,” “On-call remediation”). - List systems touched (Stripe, Salesforce, Jira, AWS, internal admin). - Classify actions as READ vs WRITE vs DESTRUCTIVE. - Define a rollback path for each write (revert PR, void invoice, undo permission grant). 2) Identity & permissions (least privilege) - Create a dedicated agent identity (service account) per workflow. - Grant tool permissions per action, not “admin” access. - Separate read-only credentials from write credentials. - Set explicit thresholds (e.g., refunds <= $50 auto; $51–$300 requires approval; >$300 blocked). 3) Policy enforcement (outside the model) - Put a policy gate between agent output and tool execution. - Validate tool inputs with strict schemas (JSON Schema / Pydantic). - Enforce destination allowlists (email domains, Slack channels, export buckets, webhook hosts). - Add rate limits: max tool calls per run; max runs per minute; max spend per hour. 4) Observability & audit - Assign a unique run_id per execution and propagate it to every tool call. - Log: tool name, args (redacted), response codes, latency, and side-effect IDs (refund_id, ticket_id, PR#). - Store immutable audit events for write actions. - Set retention rules: short TTL for sensitive payloads; long-term retention for redacted metadata. 5) Quality gates & evals - Define success metrics: completion rate, critical error rate, escalation rate, AHT, CSAT impact. - Create a labeled dataset of real cases (at least 200) and run regression tests. - Add canary releases: start at 1–5% traffic, then ramp weekly. - Require a “kill switch” to disable writes instantly. 6) Cost governance - Track cost per successful outcome (inference + tool costs + human review time). - Implement model routing (cheap model for triage/extraction; frontier model for high-stakes reasoning). - Add caching for repeated intents and retrieval results (with redaction). - Cap retries and enforce timeouts to prevent runaway loops. 7) Security & data handling - Redact secrets/PII before sending to LLMs when possible. - Block direct access to raw credential stores; use short-lived tokens. - Review prompt/tool injection risks for any retrieval sources. - Document data residency requirements and subprocessors for compliance. Graduation rule of thumb: - You can move from “assist” to “autopilot” only when you can prove (a) scoped permissions, (b) 100% trace coverage for actions, (c) critical error rate <= 0.1% on write paths, and (d) cost per success below your ROI threshold.