PRODUCTION AGENT READINESS CHECKLIST (2026) Use this checklist to decide whether an agent is ready to move from demo/pilot into real customer workflows. 1) DEFINE THE JOB + SUCCESS METRICS - Target workflow defined as a “job to be done” (e.g., resolve tier-1 tickets; onboard vendors; reset MFA). - Baseline captured: average handle time, backlog, SLA breach rate, escalation rate, error rate. - Success metrics set in customer terms: cost-per-resolved-case, time-to-resolution, % deflection, CSAT delta. - Clear scope boundary: what the agent will NOT do (at launch) and what requires approval. 2) ARCHITECTURE + CONTROLLED AUTONOMY - Tool permissions are least-privilege by default (allowlisted tools + actions). - High-risk actions are gated (payments, deletes, access grants, compliance filings). - Timeouts, retries, and fallbacks exist for every tool call. - Human escalation path is explicit and fast (ticket handoff, Slack/Teams ping, on-call rotation). 3) OBSERVABILITY + AUDIT TRAILS - Every run produces a trace: prompts, retrieved docs, tool inputs/outputs, decision points. - Replay is possible for debugging (at least for 7 days; longer if regulated). - Logs redact PII/secrets; secrets are never stored in plaintext. - You can answer: “Why did the agent do this?” within minutes. 4) EVALUATIONS (EVEN IF SMALL) - A “golden set” exists: 50–200 representative tasks with expected outputs or scoring criteria. - Offline eval runs before any major change (prompt, model, retrieval, tool schema). - Online canaries: start with 1–5% traffic; monitor success rate + escalation rate. - Regression thresholds defined (e.g., block rollout if task success drops below 95%). 5) COST + MARGIN MANAGEMENT - Track cost-per-successful-run (not just cost per 1k tokens). - Identify top cost drivers: retries, long-context retrieval, verification passes. - Implement routing where possible (cheap model for easy steps; strong model for hard steps). - Pricing aligned to outcomes (per task / per resolution / share of savings), with usage caps. 6) SECURITY + COMPLIANCE BASICS - SOC 2 plan (or equivalent) documented; customer security questionnaire prepared. - Data boundaries documented: what is sent to model providers, what is retained, retention period. - Tenant isolation validated (no cross-customer leakage in retrieval or logs). - Incident response playbook exists (who responds, how to disable risky tools, customer comms). GO/NO-GO RULE (SIMPLE) GO if: you can prove measurable ROI in 30–60 days, hit a defined success rate on a golden set, and provide traces + permission controls that a security team can understand. NO-GO if: success depends on manual prompt babysitting, you can’t reproduce runs, or a single tool failure breaks the entire workflow.