AGENTIC WORKFLOW LAUNCH CHECKLIST (2026)

Goal: ship one agentic workflow that reliably completes work, stays within policy, and can be priced profitably.

1) Pick the workflow (scope)
- Name the job in one sentence (e.g., “Collect overdue invoices under $10k”).
- Define the user (operator) and the system of record (e.g., NetSuite + Gmail).
- Write acceptance criteria as a checklist (5–12 items). Must be objectively verifiable.
- Identify exclusions up front (countries, customer tiers, edge cases).

2) Map tools + permissions
- List required tools/APIs and exact actions (read vs write).
- Define default permission boundary: least privilege, purpose-limited.
- Decide approval model: no writes, step approvals, or bounded autonomy.
- Create an allowlist of tool functions + writable fields.

3) Build “receipts” into UX
- Action preview: show diffs for every write.
- Source preview: link to records, tickets, clauses, or docs used.
- Cost/time preview: show expected duration and run budget.
- Post-run receipt: structured timeline of tool calls and outcomes.

4) Safety + verification rails
- Deterministic validators: schemas, required fields, numerical bounds.
- Policy checks: PII patterns, forbidden destinations, restricted records.
- Rate limits + circuit breakers for flaky tools.
- Rollback plan: how to undo writes or compensate downstream.

5) Evaluation plan (before GA)
- Create a replay set: 50–200 real tasks (redact sensitive data).
- Track invariants: policy breaches, invalid payloads, timeouts, retries.
- Set launch thresholds (example):
  * Verified completion ≥ 40% in pilot
  * Incident rate (policy) < 1 per 1,000 runs
  * Median time-to-done < 5 minutes (for micro-ops)

6) Instrumentation + weekly metrics
- Verified completion rate (not just “successful responses”).
- Escalation rate + top 5 escalation reasons.
- Time-to-done (p50/p90) and tool latency breakdown.
- Gross margin per 1,000 runs (revenue vs inference/tool costs).

7) Rollout sequence
- Shadow mode (2–4 weeks): propose actions, humans execute.
- Guided execution: agent executes with approvals + diffs.
- Bounded autonomy: allow run-to-completion within budgets/scopes.
- Exception handling: clear takeover path and escalation UI.

8) Operating cadence
- Assign an “agent on-call” owner for failures and urgent fixes.
- Weekly eval review: compare release-to-release drift.
- Monthly scope expansion review: add one tool/action at a time.
- Security/compliance review before expanding to higher-stakes actions.

9) Packaging + pricing sanity check
- Separate assist vs act tiers.
- Price per workflow bundle (runs), with caps and overage rules.
- Make governance a paid unlock (audit retention, BYO-key, admin controls).

Definition of Done: one workflow runs in production with stable acceptance criteria, visible receipts, measurable ROI, and guardrails that prevent high-severity mistakes.