AGENTIC WORKFLOW LAUNCH CHECKLIST (2026) Goal: Launch one production agent workflow with measurable ROI and controlled risk in 30–60 days. 1) Workflow Selection (Week 0) - Pick a workflow with: (a) ≥200 tasks/month, (b) clear start/end states, (c) low-to-moderate risk. - Write a one-page “Definition of Done”: inputs, outputs, edge cases, and escalation rules. - Define the KPI trio: - Business: cost per outcome (e.g., $/ticket resolved, $/qualified lead) - Quality: severity-weighted accuracy (define P0/P1/P2 errors) - Ops: tool success rate + latency (p95) 2) Data + Evals (Week 1) - Build an initial eval set of 100–200 historical cases. - Label each case with: - Expected action (approve/deny/escalate) - Required evidence (which fields/docs must be present) - Severity of wrong output (P0/P1/P2) - Establish a regression rule: no model/prompt/tool change ships without running the eval suite. 3) Tooling + Safety (Weeks 1–2) - Wrap every tool/API with: - Typed schemas (structured inputs/outputs) - Idempotency keys for write actions - Allowlisted parameters (no free-form deletes/updates) - Least-privilege credentials (no shared admin tokens) - Add explicit human approval gates for high-severity actions (e.g., refunds above $X, production deploys). 4) Build + Shadow Mode (Weeks 2–3) - Run the agent in shadow mode for 1–2 weeks. - Track: - Containment rate (how often the agent could have completed without human) - Disagreement rate vs. humans - P0 error rate (must be near-zero before autonomy) - Create a weekly review ritual: top failures, root causes, fixes, and new eval cases. 5) Graduated Autonomy (Weeks 4–6) - Start with low-risk actions only (e.g., tagging, drafting, suggesting). - Move to limited write actions with caps (e.g., refunds ≤$50, changes only in sandbox). - Define rollback triggers (examples): - P0 error detected - Tool failure rate >2% daily - Cost per outcome increases week-over-week for 2 weeks 6) Monitoring + Governance (Ongoing) - Log everything needed to replay incidents: prompts, retrieved docs, tool calls, outputs, reviewer decisions. - Build dashboards: - Cost per successful outcome - Escalation rate + reasons - Tool call success rate - Drift indicators (changes in input mix, rising uncertainty) - Assign owners: - Functional owner (Support/Sales/Finance) for KPI outcomes - Platform/Security owner for permissions, logging, and credential hygiene - Agent Ops owner for eval maintenance and incident reviews Ship criteria (minimum): - ≥200-case eval suite in CI - 100% tool calls behind typed wrappers - Human gate for all high-severity actions - Ability to reproduce any incident within 24 hours - Clear unit economics dashboard tied to business KPIs