Agent Sandbox Readiness Checklist (One-Page)

Use this to review any agent that can call tools, browse the web, or change internal systems.

1) Identity & Access
- Does the agent have its own identity (service principal / OAuth client), separate from any human?
- Are permissions least-privilege (resource + verb), not blanket app access?
- Are credentials short-lived (OAuth tokens) rather than long-lived API keys?
- Is there an explicit deny list (finance actions, admin settings, user management) that cannot be overridden by prompts?

2) Sandbox Boundaries
- Does the agent run in an isolated environment (container/VM) with a clean filesystem per run?
- Is outbound network default-deny with an allowlist for required domains/endpoints?
- If a browser is used, is it a hardened remote browser with domain allowlists and no privileged saved sessions?

3) Tool Design
- Are tools narrowly scoped (e.g., CreateDraftInvoice vs CreateInvoice)?
- Are tool arguments validated against a strict schema (types, ranges, required fields)?
- Do tools support “dry-run” or preview modes that produce diffs without committing?

4) Policy Gates & Approvals
- Is every tool call evaluated by a policy layer before execution?
- Are there clear approval triggers (any spend, merges, external messaging, data export)?
- Does the approval UI show concrete diffs and destinations (what will change, where), not just an explanation?

5) Observability & Audit
- Are prompts, retrieved context IDs, tool calls, tool results, and final outputs logged?
- Can you reconstruct an end-to-end timeline for a single incident without guesswork?
- Are logs protected against tampering and retained per your security policy?

6) Data Handling
- Is retrieval constrained to approved sources and scopes (projects/spaces)?
- Are sensitive sources (HR/legal/customer secrets) explicitly blocked or segmented?
- Is untrusted text sanitized before it reaches the model (strip hidden text/scripts where possible)?

7) Rollback & Containment
- Can you quickly revoke the agent’s credentials and stop execution?
- Are agent actions reversible (drafts instead of publishes; PRs instead of direct pushes)?
- Is there a defined blast radius (one agent per domain, separate scopes)?

Go/No-Go Rule:
If you cannot answer “what could it do?” and “how would we prove what happened?” with specific logs and scopes, it’s not ready for production access.