Agent Sandbox Readiness Checklist (One-Page) Use this to review any agent that can call tools, browse the web, or change internal systems. 1) Identity & Access - Does the agent have its own identity (service principal / OAuth client), separate from any human? - Are permissions least-privilege (resource + verb), not blanket app access? - Are credentials short-lived (OAuth tokens) rather than long-lived API keys? - Is there an explicit deny list (finance actions, admin settings, user management) that cannot be overridden by prompts? 2) Sandbox Boundaries - Does the agent run in an isolated environment (container/VM) with a clean filesystem per run? - Is outbound network default-deny with an allowlist for required domains/endpoints? - If a browser is used, is it a hardened remote browser with domain allowlists and no privileged saved sessions? 3) Tool Design - Are tools narrowly scoped (e.g., CreateDraftInvoice vs CreateInvoice)? - Are tool arguments validated against a strict schema (types, ranges, required fields)? - Do tools support “dry-run” or preview modes that produce diffs without committing? 4) Policy Gates & Approvals - Is every tool call evaluated by a policy layer before execution? - Are there clear approval triggers (any spend, merges, external messaging, data export)? - Does the approval UI show concrete diffs and destinations (what will change, where), not just an explanation? 5) Observability & Audit - Are prompts, retrieved context IDs, tool calls, tool results, and final outputs logged? - Can you reconstruct an end-to-end timeline for a single incident without guesswork? - Are logs protected against tampering and retained per your security policy? 6) Data Handling - Is retrieval constrained to approved sources and scopes (projects/spaces)? - Are sensitive sources (HR/legal/customer secrets) explicitly blocked or segmented? - Is untrusted text sanitized before it reaches the model (strip hidden text/scripts where possible)? 7) Rollback & Containment - Can you quickly revoke the agent’s credentials and stop execution? - Are agent actions reversible (drafts instead of publishes; PRs instead of direct pushes)? - Is there a defined blast radius (one agent per domain, separate scopes)? Go/No-Go Rule: If you cannot answer “what could it do?” and “how would we prove what happened?” with specific logs and scopes, it’s not ready for production access.