AGENTIC WORKFLOW LAUNCH CHECKLIST (2026)

Use this checklist to take an AI workflow from prototype to production autonomy without losing reliability or blowing up costs.

1) DEFINE THE WORKFLOW CONTRACT
- Name the workflow and its business owner (e.g., “Refund approvals — Head of Support”).
- Define the outcome in business terms (e.g., “Resolve refund request within 2 hours”).
- Set explicit thresholds: max $ amount, max recipients, allowed systems, and when to escalate.
- Identify what “incorrect” means (financial loss, compliance breach, customer harm, data exposure).

2) DESIGN TOOLING + PERMISSIONS
- Split tools into READ vs WRITE operations.
- Implement least-privilege access per role/workspace; document who can enable auto-execution.
- Add idempotency keys for all write tools; ensure safe retries.
- Build an undo path for every write (revert field diff, cancel/void, delete created artifacts).

3) INSTRUMENTATION (REQUIRED FIELDS)
Log per run: run_id, workflow name, model/version, tokens in/out, latency, tool calls, retrieval docs, policy blocks, human review flag, outcome label, estimated value ($) and cost ($).

4) EVALUATION PLAN
- Create a golden set (50–300 cases) that reflects real distribution, edge cases, and failure modes.
- Define 3 core metrics:
  a) Task success rate (end-to-end correctness)
  b) Cost per completed task (tokens + tools + human review)
  c) Time to resolution (including escalations)
- Add weekly sampling in production (e.g., review 200 runs/week or 1% of volume).
- Set an error budget (e.g., “<0.5% reversals/month”).

5) COST CONTROLS
- Add per-run token ceilings and per-workflow dollar budgets.
- Implement model routing (cheap for triage/extraction; premium only for hard/high-stakes).
- Add caching for retrieval and deterministic tool outputs; consider output caching for safe templates.
- Define a “budget request” behavior when limits are exceeded (with reason + escalation).

6) ROLLOUT STAGES
- Stage 1 (Draft): agent proposes; user executes.
- Stage 2 (Assisted): agent executes low-risk writes with confirmation.
- Stage 3 (Auto, bounded): auto-executes within caps; strict policies; fast undo.
- Stage 4 (Auto, adaptive): continuous evals + drift alerts; kill switch always available.

7) GOVERNANCE + SECURITY
- Ensure prompts/tool traces are stored securely with retention policy.
- Document data boundaries (what leaves your VPC, what’s redacted, what’s encrypted).
- Implement a kill switch to disable auto-execution instantly while keeping draft mode.
- Prepare procurement-ready answers: audit logs, role matrix, incident response, versioning.

8) GO/NO-GO GATE
Ship autonomy only if: golden set passes thresholds, production sampling is in place, undo works, budgets are enforced, and a business owner accepts the error budget in writing.