AI CONTROL PLANE LAUNCH CHECKLIST (2026) Use this checklist before you enable any write-capable agentic feature in production. Treat it like a pre-launch review for payments or infra changes. 1) Scope the workflow (one page) - Define the user outcome in a single sentence (e.g., “reconcile invoice to PO and create draft journal entry”). - List every external system touched (e.g., NetSuite, Salesforce, GitHub, Jira, Slack). - Identify the maximum acceptable harm (“blast radius”) if the agent is wrong. - Decide the initial mode: suggest-only, approval-required, or bounded autonomy. 2) Identity & permissions - Confirm least-privilege access for every tool (scoped OAuth, short-lived tokens, service accounts). - Map roles to actions (viewer/editor/admin) and document who can enable autonomy. - Implement time-bound delegation where possible (session-based credentials). - Add “break glass” kill switch: disable all execution globally within minutes. 3) Tooling & policy gates - Register tools with typed schemas and strict input validation. - Create an allowlist of tools per workflow; default deny. - Encode policy rules (limits, required approvals, disallowed destinations, data residency). - Add confirmation gates for irreversible actions (send money, delete data, production changes). 4) Routing & cost controls - Set hard caps: max tokens per task, max tool calls, max retries, hard cost cap in USD. - Implement multi-model routing (cheap model for classification/extraction; stronger model for synthesis). - Define downgrade behavior when budgets are hit (smaller model, reduced context, require approval). - Add tenant-level and user-level monthly caps; expose them in admin UI. 5) Evaluation & release - Build a golden set of 50–200 real cases (redact PII) plus 20+ adversarial cases. - Define pass/fail metrics: completion rate, policy violation rate, unsafe tool call rate, hallucination rate. - Run canaries: 1–5% of traffic with automatic rollback thresholds. - Require a human review step for any prompt/tool schema change that affects write paths. 6) Observability & forensics - Ensure every run has a trace ID spanning retrieval, model calls, and tool calls. - Log “receipts”: what changed, which tool calls executed, and what policy checks ran. - Redact PII in logs; set retention (e.g., 30 days) and document it for customers. - Create an incident playbook: how to replay runs, identify root cause, and notify customers. 7) UX controls that build trust - Provide previews (diffs, drafts) before execution whenever possible. - Provide post-action receipts users can export (for audit/compliance). - Make error states legible: show what failed and the next safest action. - Add an admin “safe mode” toggle (read-only / approval-only) for regulated teams. Exit criteria for enabling bounded autonomy - Policy violation rate consistently below your threshold (example: <0.5% in canary). - Clear rollback or containment path for every action type. - Documented cost-per-completed-task and gross margin impact for top workflows. - Support team trained to interpret receipts and traces; on-call rotation defined.