AGENT CONTROL PLANE MVP CHECKLIST (SHIP IN 30 DAYS) Goal: Make ONE agent workflow safe to run in production by adding enforceable permissions, tool contracts, approvals, and an audit trail. This is not a generic AI checklist. It’s the minimum to avoid “the model did it” incidents. 1) Pick the workflow (Day 1) - Choose one workflow with real downside (money movement, outbound messages, deletes, production changes). - List systems touched (e.g., Gmail/SendGrid, Stripe, Jira, GitHub, internal DB, Kubernetes). - Define “commit moment” (the step that causes irreversible change). 2) Build a Tool Gateway (Week 1) - Create a single service that executes ALL agent tool calls. - Require explicit tool registration (name, version, owner, risk level). - Validate inputs with a strict schema (reject unknown fields). - Add idempotency for side effects (idempotency key required on mutating tools). - Add dry-run mode: tool returns a preview of what it would do. 3) Identity + Permissions (Week 2) - Tie every agent session to a human identity or service account. - Use scoped credentials (OAuth scopes where possible; short-lived tokens where you control auth). - Enforce deny-by-default on high-risk tools. - Add environment constraints (prod vs staging) so “test” agents can’t touch prod. 4) Approvals (Week 2–3) - Implement “Propose → Review → Commit” for high-risk tools. - Store the proposed action payload exactly as it will be executed. - Require explicit approver identity, timestamp, and optional comment. - Add expiration: proposals can’t be executed after they go stale. - Add cancellation: user can revoke approval before commit. 5) Audit Trail + Observability (Week 3) - Write an append-only event log per agent job: - model request metadata (provider/model), tool calls, tool outputs, approvals, commits - actor identity (user/service account), job/session ID, timestamps - Redact secrets at ingestion (tokens, passwords, private keys). - Ensure you can answer in one query: who approved, what executed, what changed. 6) Rollback / Compensating Actions (Week 4) - For each mutating tool, define rollback: - true undo (preferred) or compensating action (acceptable) - Document what cannot be undone; tighten approvals for those. - Add a kill switch: disable a tool globally without redeploying. Acceptance Tests (must pass before launch) - Attempt tool bypass (direct API call) fails. - Schema fuzzing: malformed/extra fields rejected. - Privilege escalation attempts denied (cross-tenant, cross-project, external recipients, prod access). - Replay: same idempotency key does not double-execute. - Audit completeness: every commit has a traceable proposal + approval (if required). - Rollback works on partial failure. Launch Rule If you can’t show a customer (or your own on-call) the complete story of an agent action in under five minutes, you’re not ready to ship autonomous execution.