Agentic Ops Release Gate (2026): Production Readiness Checklist

Use this checklist as a hard release gate before enabling an AI agent to take actions (create/update/delete) in any real system.

1) Scope & Success Criteria
- Define the single workflow/job the agent owns (one sentence).
- List out-of-scope actions explicitly (at least 5 “never do” items).
- Define success metrics with targets: Task Completion Rate (%), Action Error Rate (%), P95 latency (seconds), escalation rate (%), and cost per task ($).
- Identify “must-escalate” cases (e.g., high-dollar refunds, security events, legal requests).

2) Tool Contracts (Deterministic Shell)
- Every tool has a typed schema (JSON schema or equivalent) with validation.
- Tools enforce idempotency keys for retries.
- Tool outputs are structured and versioned.
- Prohibit raw shell/SQL execution; require intent objects or parameterized queries.

3) Permissions & Data Boundaries
- Use least-privilege API keys per agent; no shared human tokens.
- Separate staging vs production credentials.
- Enforce RBAC/ABAC at the tool layer (not only in prompts).
- Add step-up approvals for irreversible or high-risk actions.
- Define data retention for prompts/traces; ensure PII redaction in logs.

4) Evaluation & Regression
- Build a seed dataset of 100–300 real cases (anonymized if needed).
- Add adversarial tests (prompt injection, tool misuse, missing context, contradictory instructions).
- Set pass/fail thresholds and block deploys when failing (CI gate).
- Add a replay mechanism to reproduce any production failure.

5) Observability
- Implement agent traces: prompt version, retrieved context IDs, tool calls, tool results, policy decisions, and final side effects.
- Dashboard: AER, escalation rate, latency, token spend, tool-call distribution.
- Alerts: spend spikes, AER spikes, unusual tool usage, elevated retries/loops.

6) Cost & Performance Controls
- Set per-task budget caps in dollars and max tool calls.
- Implement model routing (cheap model first; escalate only on low confidence).
- Add caching for stable references (policy docs, pricing tables) with TTL.
- Define latency SLOs for interactive vs async tasks.

7) Incident Response & Rollback
- Circuit breakers: when triggered, downgrade to “suggest-only” or disable write tools.
- Rollback plan for every side effect (diff-based where possible).
- On-call runbook: how to disable agent, rotate keys, and replay traces.
- Postmortem template includes: root cause, blast radius, detection gap, and new tests added within 48 hours.

Approval Sign-off
- Product owner: ________  Date: ________
- Engineering owner: ______  Date: ________
- Security owner: ________  Date: ________
- SRE/Platform owner: ____  Date: ________