AGENTIC WORKFLOW PRODUCTION READINESS CHECKLIST (2026) Use this checklist to ship an agent workflow that is measurable, governable, and safe. 1) SCOPE & SUCCESS METRICS - Define a single workflow (start narrow): trigger → steps → “done.” - Choose 1–2 KPIs and a baseline (e.g., time-to-first-response, deflection rate, hours saved/week, error rate). - Decide which actions are READ vs WRITE vs IRREVERSIBLE. - Set target SLOs by action type (example: READ 99.5% acceptable; WRITE 99.9% with verification; IRREVERSIBLE requires approval). 2) DATA & CONTEXT - List required data sources (tickets, CRM, billing, logs) and classify sensitivity (PII, PCI, PHI). - Implement data minimization: retrieve only what the step needs. - Add redaction/filters for PII where feasible. - Document retention rules for prompts, retrieved context, and tool outputs. 3) TOOLING & STATE - Keep workflow state outside the model (DB record + explicit state machine). - Require idempotency keys for every tool action. - Implement timeouts, retries with backoff, and circuit breakers. - Add an explicit kill switch (feature flag) and a safe fallback path. 4) PERMISSIONS & GOVERNANCE - Separate identities: requester (human), agent runtime, tool/service account. - Enforce least privilege: per-tool scopes, per-action allowlists. - Add approval gates for high-impact actions (e.g., refunds > $500, permission changes, mass outbound messages). - Log: who requested, what context was used, what actions were taken, and what was changed. 5) EVALUATION (EVALS) - Build a golden set (200–2,000 real examples) with expected outcomes. - Run regression evals nightly and on every release candidate. - Track: schema adherence, tool-call accuracy, hallucination rate, escalation rate, latency, and cost per successful task. - Add adversarial tests: prompt injection attempts, malformed inputs, missing context. 6) VERIFICATION LAYER - Implement deterministic business rules first (schemas, invariants, constraints). - Add probabilistic checks where needed (LLM judge, cross-model critique) with thresholds. - Define escalation rules: when confidence is low or policies block actions. - Require “evidence” for decisions (links to ticket, records, retrieved docs). 7) OBSERVABILITY & COST CONTROLS - Trace each request end-to-end (model calls + tool calls + retries). - Compute dollars per successful task and alert on regressions. - Use routing (small → large) and caching for repeated queries. - Set budget alerts and per-tenant rate limits. 8) ROLLOUT PLAN - Shadow mode first: agent suggests, human executes. - Limited write access next: caps + canary rollout to a small cohort. - Full rollout only after stable metrics over a defined window (e.g., 2–4 weeks). - Post-incident review process: classification, root cause, new tests added. If you can’t answer “Who did what, using which data, under what permission, and how we verified it?” you’re not ready for production.