PRODUCTION AGENT LAUNCH CHECKLIST (2026) Use this checklist to move from “cool demo” to “trusted automation” in a customer workflow. 1) SCOPE THE FIRST DEPLOYMENT - Pick one workflow with a single primary KPI (examples: reduce ticket backlog 25%; cut invoice processing time 40%; reduce PR review cycle time 20%). - Define the unit of work you’ll price and measure (ticket resolved, invoice posted, account updated). - Document what “done” means in deterministic terms (fields updated, statuses set, confirmation recorded). 2) DESIGN SAFETY BOUNDARIES - Create a permission model (RBAC) for tools: read vs write vs admin actions. - Add approval gates for high-impact actions (external email send, payment release, permission changes). - Implement a kill switch: disable tool execution globally while keeping read-only analysis available. - Build idempotency into tool calls (avoid duplicate writes on retries). 3) INSTRUMENT COST AND RELIABILITY - Track per-task: token usage, tool calls, retries, latency, and final outcome. - Set caps: maximum inference cost per completed task and maximum average retries. - Add model routing rules (small model for routine classification/extraction; large model for ambiguous reasoning). 4) BUILD EVALUATION BEFORE SCALE - Create a golden set (50–200 real tasks) with expected outputs and success criteria. - Add regression tests for every prompt/toolchain change. - Run shadow mode in the customer environment: agent suggests actions, humans execute. - Define thresholds for rollback (e.g., unsafe_action_rate > 0.1% OR task_success_p95 < 90%). 5) AUDITABILITY AND COMPLIANCE BASICS - Log every agent action: inputs, tool calls, outputs, timestamps, actor (agent/human), and policy decisions. - Ensure logs are exportable (CSV/API) and SIEM-friendly. - Document data handling: retention, redaction, and tenant isolation. - Prepare enterprise minimums: SSO, SCIM, and a roadmap to SOC 2 Type II if selling >$100k ACV. 6) CUSTOMER ROLLOUT PLAN - Phase 1: Shadow (recommendations only) - Phase 2: Assisted (draft + human approve) - Phase 3: Autonomous (bounded by policy + continuous monitoring) - Train operators on: approvals, exceptions, and how to interpret audit logs. 7) POST-LAUNCH OPERATIONS - Create agent incident response (AIR): severity definitions, on-call rotation, and rollback playbooks. - Run weekly review: top failure categories, cost outliers, and tool error hotspots. - Maintain a change log for model/provider updates and re-run evals on the golden set. If you can’t measure success, cost, and safety per unit of work, you don’t have an agent product—you have an experiment.