AGENTIC WORKFLOW LAUNCH CHECKLIST (2026) Goal: ship one agentic workflow that reliably completes work, stays within policy, and can be priced profitably. 1) Pick the workflow (scope) - Name the job in one sentence (e.g., “Collect overdue invoices under $10k”). - Define the user (operator) and the system of record (e.g., NetSuite + Gmail). - Write acceptance criteria as a checklist (5–12 items). Must be objectively verifiable. - Identify exclusions up front (countries, customer tiers, edge cases). 2) Map tools + permissions - List required tools/APIs and exact actions (read vs write). - Define default permission boundary: least privilege, purpose-limited. - Decide approval model: no writes, step approvals, or bounded autonomy. - Create an allowlist of tool functions + writable fields. 3) Build “receipts” into UX - Action preview: show diffs for every write. - Source preview: link to records, tickets, clauses, or docs used. - Cost/time preview: show expected duration and run budget. - Post-run receipt: structured timeline of tool calls and outcomes. 4) Safety + verification rails - Deterministic validators: schemas, required fields, numerical bounds. - Policy checks: PII patterns, forbidden destinations, restricted records. - Rate limits + circuit breakers for flaky tools. - Rollback plan: how to undo writes or compensate downstream. 5) Evaluation plan (before GA) - Create a replay set: 50–200 real tasks (redact sensitive data). - Track invariants: policy breaches, invalid payloads, timeouts, retries. - Set launch thresholds (example): * Verified completion ≥ 40% in pilot * Incident rate (policy) < 1 per 1,000 runs * Median time-to-done < 5 minutes (for micro-ops) 6) Instrumentation + weekly metrics - Verified completion rate (not just “successful responses”). - Escalation rate + top 5 escalation reasons. - Time-to-done (p50/p90) and tool latency breakdown. - Gross margin per 1,000 runs (revenue vs inference/tool costs). 7) Rollout sequence - Shadow mode (2–4 weeks): propose actions, humans execute. - Guided execution: agent executes with approvals + diffs. - Bounded autonomy: allow run-to-completion within budgets/scopes. - Exception handling: clear takeover path and escalation UI. 8) Operating cadence - Assign an “agent on-call” owner for failures and urgent fixes. - Weekly eval review: compare release-to-release drift. - Monthly scope expansion review: add one tool/action at a time. - Security/compliance review before expanding to higher-stakes actions. 9) Packaging + pricing sanity check - Separate assist vs act tiers. - Price per workflow bundle (runs), with caps and overage rules. - Make governance a paid unlock (audit retention, BYO-key, admin controls). Definition of Done: one workflow runs in production with stable acceptance criteria, visible receipts, measurable ROI, and guardrails that prevent high-severity mistakes.