Agent Ops Readiness Checklist (v1)

Use this to pressure-test an “agent” that can take actions (writes) in real systems.

1) Identity & Access
- Does the agent have its own first-class identity (not a shared human account)?
- Are privileges scoped to the minimum tool set and minimum actions required?
- Can you rotate and revoke agent credentials quickly (including during an incident)?
- If the agent acts “on behalf of” a user, is that delegation explicit and logged?

2) Tool Boundary Controls
- Is there an explicit allowlist of tools the agent can call?
- Are tool schemas typed/validated so malformed arguments are rejected?
- Are high-risk verbs (delete, transfer funds, change IAM, production deploy) blocked or approval-gated by policy?
- Are rate limits and concurrency limits enforced at the tool boundary?

3) Approvals & Change Management
- Which actions require human approval? Is that encoded as policy rather than a prompt instruction?
- Do approvals capture the full context needed to decide (inputs, retrieved sources, proposed changes)?
- Can you route approvals to existing systems (ServiceNow/Jira/Slack) with clear ownership?

4) Audit Trail & Forensics
- Can you reconstruct an agent run end-to-end: inputs, retrieved context references, tool calls, outputs, and downstream object IDs?
- Are logs tamper-resistant and retained according to your compliance needs?
- Can you answer “who did what, when, and why” without scraping ad hoc text logs?

5) Safety & Data Handling
- Is sensitive data (PII, secrets) redacted or blocked from leaving defined boundaries?
- Are there clear rules for what data may be sent to external model APIs?
- Do you have a plan for prompt injection and data exfiltration via tool outputs?

6) Evals & Release Discipline
- Do you have regression tests for the top workflows the agent executes?
- Do evals run in CI/CD and block releases when critical workflows fail or policies are violated?
- Do incidents create new test cases so failures don’t repeat?

7) Operations
- Is there a kill switch that stops actions fast without taking down unrelated systems?
- Are alerts wired to real on-call processes, with actionable signals (not noisy chat transcripts)?
- Can you roll back or remediate downstream changes reliably?

Decision Rule
If you can’t (a) revoke privileges fast, (b) reconstruct a run precisely, and (c) gate risky actions by policy, you don’t have a production agent. You have a demo.