AI CONTROL PLANE LAUNCH CHECKLIST (2026)

Use this checklist before you enable any write-capable agentic feature in production. Treat it like a pre-launch review for payments or infra changes.

1) Scope the workflow (one page)
- Define the user outcome in a single sentence (e.g., “reconcile invoice to PO and create draft journal entry”).
- List every external system touched (e.g., NetSuite, Salesforce, GitHub, Jira, Slack).
- Identify the maximum acceptable harm (“blast radius”) if the agent is wrong.
- Decide the initial mode: suggest-only, approval-required, or bounded autonomy.

2) Identity & permissions
- Confirm least-privilege access for every tool (scoped OAuth, short-lived tokens, service accounts).
- Map roles to actions (viewer/editor/admin) and document who can enable autonomy.
- Implement time-bound delegation where possible (session-based credentials).
- Add “break glass” kill switch: disable all execution globally within minutes.

3) Tooling & policy gates
- Register tools with typed schemas and strict input validation.
- Create an allowlist of tools per workflow; default deny.
- Encode policy rules (limits, required approvals, disallowed destinations, data residency).
- Add confirmation gates for irreversible actions (send money, delete data, production changes).

4) Routing & cost controls
- Set hard caps: max tokens per task, max tool calls, max retries, hard cost cap in USD.
- Implement multi-model routing (cheap model for classification/extraction; stronger model for synthesis).
- Define downgrade behavior when budgets are hit (smaller model, reduced context, require approval).
- Add tenant-level and user-level monthly caps; expose them in admin UI.

5) Evaluation & release
- Build a golden set of 50–200 real cases (redact PII) plus 20+ adversarial cases.
- Define pass/fail metrics: completion rate, policy violation rate, unsafe tool call rate, hallucination rate.
- Run canaries: 1–5% of traffic with automatic rollback thresholds.
- Require a human review step for any prompt/tool schema change that affects write paths.

6) Observability & forensics
- Ensure every run has a trace ID spanning retrieval, model calls, and tool calls.
- Log “receipts”: what changed, which tool calls executed, and what policy checks ran.
- Redact PII in logs; set retention (e.g., 30 days) and document it for customers.
- Create an incident playbook: how to replay runs, identify root cause, and notify customers.

7) UX controls that build trust
- Provide previews (diffs, drafts) before execution whenever possible.
- Provide post-action receipts users can export (for audit/compliance).
- Make error states legible: show what failed and the next safest action.
- Add an admin “safe mode” toggle (read-only / approval-only) for regulated teams.

Exit criteria for enabling bounded autonomy
- Policy violation rate consistently below your threshold (example: <0.5% in canary).
- Clear rollback or containment path for every action type.
- Documented cost-per-completed-task and gross margin impact for top workflows.
- Support team trained to interpret receipts and traces; on-call rotation defined.