PRODUCTION AGENT READINESS CHECKLIST (2026) Use this checklist before giving an agent write access to real systems. 1) SCOPE & ROI - Workflow defined in one sentence (example: “Triage and route inbound support tickets for Product A”). - Success metric chosen (e.g., +12% first-contact resolution OR -20% handle time within 60 days). - Clear “stop conditions” documented: when the agent must escalate to a human. 2) ACTIONS, PERMISSIONS, AND THRESHOLDS - Tool list is explicit (e.g., Zendesk, Salesforce, Stripe, Jira) and each tool has least-privilege scopes. - High-impact actions have caps (e.g., refunds <= $25 auto; $26–$200 requires approval; >$200 blocked). - Time-bound credentials in place (short-lived tokens; no shared admin keys). - Rate limits and kill switch implemented (per-tenant and global). 3) TOOL CONTRACTS (NON-NEGOTIABLE) - Every tool call has a schema (types, required fields, enums for reason codes). - Validation runs before execution; invalid payloads return deterministic errors. - Idempotency keys used for side-effecting actions (refunds, emails, record updates). 4) MEMORY DESIGN - Tiered memory defined: short-term (per run), episodic (per case), semantic (facts with citations). - Provenance required on stored facts (source system + record ID + timestamp). - Data retention and deletion policy documented (PII handling, customer deletion requests). 5) EVALUATION & RELEASE GATING - Golden set created (200–1,000 historical cases) with expected outcomes. - Metrics tracked: trajectory success rate, tool-call error rate, policy violation rate, human takeover rate, cost per successful outcome. - Regression gating in CI: releases blocked if key metrics drop beyond thresholds. 6) OBSERVABILITY & INCIDENT RESPONSE - Full tracing per run: inputs, retrieval results, tool calls, outputs. - Audit logs exportable for security/compliance. - On-call runbook exists: common failures, rollback steps, kill switch location. 7) ROLLOUT PLAN - Phase 1: shadow mode (recommendations only). - Phase 2: assist mode (human approves actions). - Phase 3: autopilot for low-risk actions only. - A/B or staged rollout plan across tenants/queues with explicit exit criteria. 8) COST CONTROLS - Reasoning budgets set per workflow (max $/case) and enforced in code. - Model routing implemented (cheap model for classification; frontier model for complex reconciliation). - Caching strategy defined (TTL, what can be cached, what cannot). If you can’t check every item above, do not grant write permissions. Start in shadow/assist mode, tighten contracts, and build evals until you can prove safety and ROI.