PRODUCTION AGENT LAUNCH CHECKLIST (2026)

Use this checklist to move from “cool demo” to “trusted automation” in a customer workflow.

1) SCOPE THE FIRST DEPLOYMENT
- Pick one workflow with a single primary KPI (examples: reduce ticket backlog 25%; cut invoice processing time 40%; reduce PR review cycle time 20%).
- Define the unit of work you’ll price and measure (ticket resolved, invoice posted, account updated).
- Document what “done” means in deterministic terms (fields updated, statuses set, confirmation recorded).

2) DESIGN SAFETY BOUNDARIES
- Create a permission model (RBAC) for tools: read vs write vs admin actions.
- Add approval gates for high-impact actions (external email send, payment release, permission changes).
- Implement a kill switch: disable tool execution globally while keeping read-only analysis available.
- Build idempotency into tool calls (avoid duplicate writes on retries).

3) INSTRUMENT COST AND RELIABILITY
- Track per-task: token usage, tool calls, retries, latency, and final outcome.
- Set caps: maximum inference cost per completed task and maximum average retries.
- Add model routing rules (small model for routine classification/extraction; large model for ambiguous reasoning).

4) BUILD EVALUATION BEFORE SCALE
- Create a golden set (50–200 real tasks) with expected outputs and success criteria.
- Add regression tests for every prompt/toolchain change.
- Run shadow mode in the customer environment: agent suggests actions, humans execute.
- Define thresholds for rollback (e.g., unsafe_action_rate > 0.1% OR task_success_p95 < 90%).

5) AUDITABILITY AND COMPLIANCE BASICS
- Log every agent action: inputs, tool calls, outputs, timestamps, actor (agent/human), and policy decisions.
- Ensure logs are exportable (CSV/API) and SIEM-friendly.
- Document data handling: retention, redaction, and tenant isolation.
- Prepare enterprise minimums: SSO, SCIM, and a roadmap to SOC 2 Type II if selling >$100k ACV.

6) CUSTOMER ROLLOUT PLAN
- Phase 1: Shadow (recommendations only)
- Phase 2: Assisted (draft + human approve)
- Phase 3: Autonomous (bounded by policy + continuous monitoring)
- Train operators on: approvals, exceptions, and how to interpret audit logs.

7) POST-LAUNCH OPERATIONS
- Create agent incident response (AIR): severity definitions, on-call rotation, and rollback playbooks.
- Run weekly review: top failure categories, cost outliers, and tool error hotspots.
- Maintain a change log for model/provider updates and re-run evals on the golden set.

If you can’t measure success, cost, and safety per unit of work, you don’t have an agent product—you have an experiment.