Agent Ops Readiness Checklist (v1) Use this to pressure-test an “agent” that can take actions (writes) in real systems. 1) Identity & Access - Does the agent have its own first-class identity (not a shared human account)? - Are privileges scoped to the minimum tool set and minimum actions required? - Can you rotate and revoke agent credentials quickly (including during an incident)? - If the agent acts “on behalf of” a user, is that delegation explicit and logged? 2) Tool Boundary Controls - Is there an explicit allowlist of tools the agent can call? - Are tool schemas typed/validated so malformed arguments are rejected? - Are high-risk verbs (delete, transfer funds, change IAM, production deploy) blocked or approval-gated by policy? - Are rate limits and concurrency limits enforced at the tool boundary? 3) Approvals & Change Management - Which actions require human approval? Is that encoded as policy rather than a prompt instruction? - Do approvals capture the full context needed to decide (inputs, retrieved sources, proposed changes)? - Can you route approvals to existing systems (ServiceNow/Jira/Slack) with clear ownership? 4) Audit Trail & Forensics - Can you reconstruct an agent run end-to-end: inputs, retrieved context references, tool calls, outputs, and downstream object IDs? - Are logs tamper-resistant and retained according to your compliance needs? - Can you answer “who did what, when, and why” without scraping ad hoc text logs? 5) Safety & Data Handling - Is sensitive data (PII, secrets) redacted or blocked from leaving defined boundaries? - Are there clear rules for what data may be sent to external model APIs? - Do you have a plan for prompt injection and data exfiltration via tool outputs? 6) Evals & Release Discipline - Do you have regression tests for the top workflows the agent executes? - Do evals run in CI/CD and block releases when critical workflows fail or policies are violated? - Do incidents create new test cases so failures don’t repeat? 7) Operations - Is there a kill switch that stops actions fast without taking down unrelated systems? - Are alerts wired to real on-call processes, with actionable signals (not noisy chat transcripts)? - Can you roll back or remediate downstream changes reliably? Decision Rule If you can’t (a) revoke privileges fast, (b) reconstruct a run precisely, and (c) gate risky actions by policy, you don’t have a production agent. You have a demo.