Agentic AI Production Readiness Checklist (2026) Use this checklist before giving an AI agent access to real tools (CRM, billing, email, ticketing, code repos). Target: measurable task success, bounded risk, and predictable cost. 1) Workflow Definition (1–2 hours) - Write a one-sentence “workflow contract” (input → outcome). Example: “Given a Zendesk ticket, draft a policy-correct response with cited billing facts.” - Identify action types: read-only, write, irreversible (refunds, deletions), external comms (email/SMS). - Define success metrics: task success %, policy violation %, escalation rate, average latency, cost per run. 2) Context & Tooling (platform work) - Standardize tool interfaces (prefer MCP-style connectors or a single internal tool schema). - Enforce typed parameters and structured outputs (JSON), with documented failure modes. - Implement least-privilege scopes per tool: separate read vs write; staging vs production credentials. - Add idempotency keys for write operations (avoid duplicate refunds/tickets). 3) Evaluation Gates (release discipline) - Build an initial eval set of 100–300 real cases (stratify edge cases: VIP customers, refunds, angry users, missing data). - Create 3 eval tiers: a) Task success (deterministic checks where possible) b) Safety/policy (PII leakage, jailbreak prompts, forbidden actions) c) Operational behaviors (max tool calls, max latency, loop detection) - Set ship/rollback thresholds (example): policy violations must stay <0.5%; runaway tool loops <0.2%. 4) Safety & Governance (non-negotiables) - Log every tool call: user, timestamp, tool name, parameters, outputs, model version, prompt hash. - Add approval gates for high-impact actions (refunds, contract changes, outbound sends, data deletion). - Implement PII/secrets handling: redact before model calls; never place secrets in prompts; use a vault. - Require provenance for user-facing outputs (citations to docs/tool outputs; verification step “no new facts”). 5) Cost & Performance (keep margins intact) - Add token/cost metering per step and per customer. - Implement routing: small model for classification/extraction; frontier model only for planning/final. - Add caching: prompt/result cache, semantic cache, retrieval cache for stable corpora. - Set budgets: max tokens/run, max tool calls/run, max retries/step; alert on budget breaches. 6) Operations & Incident Response - Create dashboards: success rate, violation rate, latency p95, cost/run, tool error rate. - Add circuit breakers: auto-disable write actions if violation rate spikes or tools degrade. - Define human escalation paths (where uncertain, the agent must hand off with context). - Run game days monthly: simulate tool outages, permission denials, prompt injection attempts. Exit Criteria (go/no-go) - You can replay any agent run end-to-end from logs within 5 minutes. - You can prove least-privilege access for every tool. - Evals are in CI/CD and block releases when thresholds fail. - Write actions are bounded by approvals, budgets, and circuit breakers. - Unit economics are understood: cost per successful outcome is acceptable for your pricing.