AGENTIC OPS READINESS CHECKLIST (2026) Use this checklist to move from an agent demo to a production agent that is safe, observable, and cost-bounded. 1) WORKFLOW SELECTION - Define one narrow workflow with a measurable outcome (e.g., “close 30% of order-status tickets without human reply”). - Write the success metric (quality), safety metric (policy violations), and efficiency metric (cost + latency). - Identify the highest-risk action in the workflow (money movement, deletes, sensitive data) and mark it “approval required” for v1. 2) TOOLING CONTRACTS (TREAT TOOLS LIKE APIS) - Inventory every system the agent will touch (Stripe, Salesforce, Jira, Zendesk, internal DBs). - For each tool: define schema, versioning, rate limits, idempotency strategy, and error semantics. - Implement read-only tools first; add write tools only after logging and rollback are in place. 3) IDENTITY, PERMISSIONS, AND POLICY - Create a dedicated non-human identity (NHI) per agent/workflow (not shared). - Enforce least privilege per tool and per environment (dev/staging/prod). - Add a policy gate for high-risk actions (e.g., refunds > $100, any deletion, access to sensitive fields). - Require short-lived credentials and explicit allowlists for tool endpoints. 4) COST AND LATENCY BUDGETS - Set token budgets per task (avg and p95) and tool-call budgets (avg and p95). - Add routing: small model for classify/extract; large model only for complex reasoning. - Add caching where repeated queries occur (common policies, product docs, known issues). - Define circuit breakers: when external APIs throttle or p95 latency spikes, switch to draft-only mode. 5) EVALUATION BEFORE AUTONOMY - Build a scenario suite with 200–1,000 real examples (anonymized) including edge cases and adversarial inputs. - Add unit tests for prompt/tool schema changes; block deploys on regressions. - Define required pass rates for low-risk actions (e.g., 95%+) before enabling autonomy. 6) OBSERVABILITY AND AUDIT TRAILS - Log: prompt version, model, tool schema versions, retrieved doc IDs/hashes, tool inputs/outputs, and final action. - Enable trace visualization of the tool-call graph (one trace per task). - Build dashboards: success rate, escalation rate, policy violations per 1k tasks, cost per task, p95 latency. 7) RELEASE AND INCIDENT PRACTICES - Ship in stages: observe-only → draft-only → execute-with-approval → limited autonomy. - Add a kill switch and a rollback plan for prompt/config/tool changes. - Assign ownership: who is on-call, who triages failures, who approves policy changes. - Run weekly reviews of top failure modes and update tools/policies before scaling. EXIT CRITERIA FOR “PRODUCTION READY” - Clear SLOs with alerting. - Least-privilege identity + audited policy gates. - Scenario suite running on every change. - Hard budgets for tokens and tool calls. - Documented escalation path and kill switch tested in staging.