Agentic Ops Release Gate (2026): Production Readiness Checklist Use this checklist as a hard release gate before enabling an AI agent to take actions (create/update/delete) in any real system. 1) Scope & Success Criteria - Define the single workflow/job the agent owns (one sentence). - List out-of-scope actions explicitly (at least 5 “never do” items). - Define success metrics with targets: Task Completion Rate (%), Action Error Rate (%), P95 latency (seconds), escalation rate (%), and cost per task ($). - Identify “must-escalate” cases (e.g., high-dollar refunds, security events, legal requests). 2) Tool Contracts (Deterministic Shell) - Every tool has a typed schema (JSON schema or equivalent) with validation. - Tools enforce idempotency keys for retries. - Tool outputs are structured and versioned. - Prohibit raw shell/SQL execution; require intent objects or parameterized queries. 3) Permissions & Data Boundaries - Use least-privilege API keys per agent; no shared human tokens. - Separate staging vs production credentials. - Enforce RBAC/ABAC at the tool layer (not only in prompts). - Add step-up approvals for irreversible or high-risk actions. - Define data retention for prompts/traces; ensure PII redaction in logs. 4) Evaluation & Regression - Build a seed dataset of 100–300 real cases (anonymized if needed). - Add adversarial tests (prompt injection, tool misuse, missing context, contradictory instructions). - Set pass/fail thresholds and block deploys when failing (CI gate). - Add a replay mechanism to reproduce any production failure. 5) Observability - Implement agent traces: prompt version, retrieved context IDs, tool calls, tool results, policy decisions, and final side effects. - Dashboard: AER, escalation rate, latency, token spend, tool-call distribution. - Alerts: spend spikes, AER spikes, unusual tool usage, elevated retries/loops. 6) Cost & Performance Controls - Set per-task budget caps in dollars and max tool calls. - Implement model routing (cheap model first; escalate only on low confidence). - Add caching for stable references (policy docs, pricing tables) with TTL. - Define latency SLOs for interactive vs async tasks. 7) Incident Response & Rollback - Circuit breakers: when triggered, downgrade to “suggest-only” or disable write tools. - Rollback plan for every side effect (diff-based where possible). - On-call runbook: how to disable agent, rotate keys, and replay traces. - Postmortem template includes: root cause, blast radius, detection gap, and new tests added within 48 hours. Approval Sign-off - Product owner: ________ Date: ________ - Engineering owner: ______ Date: ________ - Security owner: ________ Date: ________ - SRE/Platform owner: ____ Date: ________