Production Agent Readiness Checklist (2026) Use this checklist before you let an agent take real actions in core systems (CRM, payments, ticketing, code, IT). 1) Scope & Success Metrics - Define ONE workflow the agent owns (single system of record + 1–2 downstream actions). - Write numeric acceptance criteria (e.g., 40% auto-triage rate, <2% misroute rate, p95 <30s). - Define “stop conditions” (max wall-clock time, max tool calls, max retries). - Define escalation rules (what goes to a human; what requires manager approval). 2) Tooling Contracts (Propose vs Execute) - Every tool has a strict JSON/schema contract with required fields + enumerations. - Validate every model output before executing tools. - Implement idempotency keys for any write action (refunds, tickets, PRs, CRM updates). - Separate “propose” from “execute”: model suggests; policy service approves/denies. 3) Identity, Permissions, and Secrets - Create a dedicated agent identity (OAuth client / service account) per agent. - Enforce least privilege roles; document why each permission exists. - No static secrets in prompts, logs, or vector stores; rotate credentials on a schedule (target <30 days). - Rate-limit high-impact tools (e.g., max 10 writes/minute) and add circuit breakers. 4) Data & Memory Hygiene - Classify memory types: session state, long-term knowledge, episodic logs. - Define retention per type (e.g., session 7 days, logs 30–90 days, knowledge versioned). - Add PII detection/redaction before storage and before sending to model APIs. - Ensure RAG sources are versioned and traceable (which doc/section supported a decision). 5) Evaluation & Release Process - Build a regression suite of real tasks (start with 50–200) and keep it under version control. - Track metrics: task success rate, tool-call accuracy, escalation rate, hallucination on grounded Qs. - Run canary releases: 5% volume for 48 hours, then 25%, with instant rollback. - Set a “no-ship” threshold (e.g., success rate drops >2 points vs baseline). 6) Observability & Incident Response - Emit structured traces for each run: prompts, retrieval hits, tool calls, tool outputs, validations. - Dashboard p50/p95 latency, failure rate, cost per run, and top escalation reasons. - Add alerts for abnormal behavior: duplicate actions, spike in retries, spend anomalies. - Write an incident playbook: disable switch, rollback procedure, and customer comms plan. 7) Unit Economics & Budget Controls - Define unit cost targets (e.g., <$0.50 per resolved ticket or <$2 per invoice matched). - Use model routing (cheap router + expensive solver) with explicit escalation criteria. - Enforce spend caps per day/week and per tenant; fail safe to human handoff. - Track wasted spend: tokens on failed runs, retries, and rejected tool calls. Launch Gate: You’re ready when (a) schemas + policy gates exist for every write tool, (b) you can quantify success + cost per completed task, (c) you can audit any action end-to-end, and (d) you can shut the agent off in one click.