AGENTIC WORKFLOW PRODUCTION READINESS CHECKLIST (2026) Use this checklist to move one agentic workflow from prototype to production with bounded autonomy. 1) SCOPE & SUCCESS CRITERIA - Pick one workflow (not a platform). Define start state, end state, and what “done” means. - Define targets for: - Success rate (completes without human edits) - Escalation rate (routes to humans) - Time-to-resolution and cost-per-success - Write down “never events” (examples: refund above a limit, emailing external domains, exporting sensitive fields). 2) TOOLING CONTRACTS (THE MAKE-OR-BREAK ITEM) - For each tool, specify: - JSON schema for inputs/outputs with required fields - Error codes (auth, not-found, validation, rate-limit) - Idempotency behavior for write actions - Timeouts and retry policy (including max retries and backoff) - Prefer fewer, higher-level tools over many small tools. 3) PERMISSIONS & IDENTITY - Avoid shared credentials. Use service principals with least privilege. - If actions need user identity, support delegated authorization. - Separate read authority from write authority (different scopes or different tools). - Add a break-glass path for elevated actions with explicit approval. 4) POLICY-AS-CODE GUARDRAILS - Enforce pre-tool and post-tool checks for: - Money movement thresholds - External communication restrictions - Data export and sensitive-field handling - Allowed objects/fields in systems like Salesforce, Jira, or ServiceNow - Store policies in version control and require review for changes. 5) VERIFICATION & FALLBACKS - After every write, re-read state and verify invariants. - Add deterministic fallbacks for common low-variance cases (templates or a rules engine). - Define human approval thresholds based on: - Amount limits - Risk tiers (such as regulated customers) - Low-confidence detections 6) OBSERVABILITY & AUDIT - Log: prompt inputs, retrieval document IDs, tool calls, tool payloads, tool responses, policy decisions. - Add correlation IDs so runs can be replayed end-to-end. - Track operational signals: tool latency, tool error rate, tool calls per run, loop rate. - Define log retention based on compliance needs. 7) EVALUATION HARNESS - Build an evaluation set from real cases. - Score: correctness, policy compliance, provenance/citations, and user satisfaction. - Run evals before releases and track pass-rate trends. 8) ROLLOUT PLAN - Start with shadow mode: agent proposes actions, humans approve. - Move to limited autonomy with staged traffic and a kill switch. - Create an incident playbook: rollback steps, disable writes, route to humans. 9) UNIT ECONOMICS - Monitor cost-per-success (not cost-per-run). - Set budgets: max tool calls and max spend per run. - Route by uncertainty: cheap model first, escalate only when justified. 10) SECURITY TESTING - Red team the workflow: - prompt injection through retrieved docs - tool output poisoning - privilege escalation via chained actions - Require structured tool outputs and validate schemas. If you can complete sections 1–6, you’re ready for a limited rollout. If you can complete sections 7–10, you’re ready to scale into strict enterprise environments.