AGENTIC PRODUCT LAUNCH CHECKLIST (2026)

Use this checklist to move from “AI demo” to “trusted AI teammate” in production.

1) WORKFLOW SELECTION (DO THIS FIRST)
- Choose ONE workflow with: (a) clear success criteria, (b) repeatable inputs/outputs, (c) measurable business value.
- Minimum viability target: 1,000 runs/month across customers or 50+ runs/day in a single large account.
- Define the action boundary in writing: what state changes are allowed (create/update/delete/refund/deploy) and what is forbidden.

2) AUTONOMY TIERS (PRODUCT + PRICING)
- Implement 3 modes:
  A) Suggest: insights only, no state changes.
  B) Draft: generate a diff/plan; user approves.
  C) Autopilot (scoped): execute within explicit limits (thresholds, objects, time ranges).
- Attach monetization to autonomy: higher tiers include better controls (audit exports, admin policies, approvals).

3) PERMISSIONS + POLICY
- Enforce least privilege: scopes per tool, per action, per tenant.
- Prefer short-lived credentials; log scope, TTL, and actor per run.
- Add policy-as-code where possible (e.g., “refund <= $100”, “no external browsing”, “no prod deploy without approval”).

4) RECEIPTS + REVERSIBILITY
- Every run must produce an “AI receipt”:
  - User request, plan, tools called, data sources, timestamps.
  - Links to changed objects (ticket IDs, PR URLs, record IDs).
  - Before/after snapshot or diff.
- Implement undo/rollback paths:
  - Idempotency keys for tool calls.
  - Compensating actions for external systems.
  - Dry-run mode that shows expected effects.

5) INSTRUMENTATION (ROI + RELIABILITY)
- Track per-run: token usage, retrieval queries, tool calls, retries, latency, estimated cost.
- Track quality metrics:
  - No-touch completion rate (% runs with no human intervention).
  - Correction rate within 24h (proxy for “silent wrong”).
  - Failure taxonomy (permission denied, missing data, tool error, model error, policy blocked).
- Tie to business outcome metrics (pick 1–2): TTR, deflection, match rate, MTTR, PR cycle time, forecast accuracy.

6) SAFETY GATES
- Require approvals for high-risk actions (money movement, deletion, prod changes).
- Add thresholds (amounts, record counts, time windows).
- Provide a clear “stop” control and incident escalation path.

7) ENTERPRISE READINESS
- Audit logs: immutable, exportable, searchable.
- Retention controls: configurable per tenant; support redaction and PII handling.
- Admin console: tool enable/disable, policy toggles, run history, budget limits.

8) LAUNCH PLAN
- Start with Design Partners: 3–10 customers, weekly reviews of receipts and failures.
- Ship Draft mode broadly before Autopilot.
- Publish a “What the agent can/can’t do” doc inside the product.
- Set SLOs: e.g., p95 latency, max cost/run, max correction rate.

EXIT CRITERIA (YOU’RE READY TO SCALE)
- Cost/run is predictable within a defined budget ceiling.
- You can show ≥ 1 measurable outcome improvement (e.g., 15% lower TTR or 25% higher reconciliation match rate) in at least one cohort.
- Audit + rollback story is proven in production.
- Support/CS has an incident playbook and clear customer messaging.