AGENT CONTROL PLANE MVP CHECKLIST (SHIP IN 30 DAYS)

Goal: Make ONE agent workflow safe to run in production by adding enforceable permissions, tool contracts, approvals, and an audit trail. This is not a generic AI checklist. It’s the minimum to avoid “the model did it” incidents.

1) Pick the workflow (Day 1)
- Choose one workflow with real downside (money movement, outbound messages, deletes, production changes).
- List systems touched (e.g., Gmail/SendGrid, Stripe, Jira, GitHub, internal DB, Kubernetes).
- Define “commit moment” (the step that causes irreversible change).

2) Build a Tool Gateway (Week 1)
- Create a single service that executes ALL agent tool calls.
- Require explicit tool registration (name, version, owner, risk level).
- Validate inputs with a strict schema (reject unknown fields).
- Add idempotency for side effects (idempotency key required on mutating tools).
- Add dry-run mode: tool returns a preview of what it would do.

3) Identity + Permissions (Week 2)
- Tie every agent session to a human identity or service account.
- Use scoped credentials (OAuth scopes where possible; short-lived tokens where you control auth).
- Enforce deny-by-default on high-risk tools.
- Add environment constraints (prod vs staging) so “test” agents can’t touch prod.

4) Approvals (Week 2–3)
- Implement “Propose → Review → Commit” for high-risk tools.
- Store the proposed action payload exactly as it will be executed.
- Require explicit approver identity, timestamp, and optional comment.
- Add expiration: proposals can’t be executed after they go stale.
- Add cancellation: user can revoke approval before commit.

5) Audit Trail + Observability (Week 3)
- Write an append-only event log per agent job:
 - model request metadata (provider/model), tool calls, tool outputs, approvals, commits
 - actor identity (user/service account), job/session ID, timestamps
- Redact secrets at ingestion (tokens, passwords, private keys).
- Ensure you can answer in one query: who approved, what executed, what changed.

6) Rollback / Compensating Actions (Week 4)
- For each mutating tool, define rollback:
 - true undo (preferred) or compensating action (acceptable)
- Document what cannot be undone; tighten approvals for those.
- Add a kill switch: disable a tool globally without redeploying.

Acceptance Tests (must pass before launch)
- Attempt tool bypass (direct API call) fails.
- Schema fuzzing: malformed/extra fields rejected.
- Privilege escalation attempts denied (cross-tenant, cross-project, external recipients, prod access).
- Replay: same idempotency key does not double-execute.
- Audit completeness: every commit has a traceable proposal + approval (if required).
- Rollback works on partial failure.

Launch Rule
If you can’t show a customer (or your own on-call) the complete story of an agent action in under five minutes, you’re not ready to ship autonomous execution.