AI CONTROL PLANE LAUNCH CHECKLIST (MVP) Goal: ship an AI feature that you can control, test, and audit. This is the minimum set of controls that prevents the most expensive failures. 1) INVENTORY & BOUNDARIES - List every AI entry point (feature name, where it runs, who can access it). - For each entry point, label the risk tier: - Draft: human reviews before external action. - Act: model triggers external side effects (emails, writes, refunds, permission changes). - Write down irreversible actions (anything that can’t be trivially undone). These require stricter controls. 2) STANDARD REQUEST/RESPONSE ENVELOPE - Every model call logs the same fields: tenant/user, feature, model/provider, prompt version, tools called, retrieval sources, output. - Ensure logs are searchable by request_id and tenant_id. - Decide retention and who can access logs (engineering vs support vs security). 3) MODEL ROUTING & FALLBACKS - Define an approved model list (even if it’s one model today). - Implement routing per feature (don’t hard-code model choice across the app). - Add an explicit fallback behavior: - backup model OR - safe mode response (ask clarifying question, route to human, citations-only). 4) TOOL PERMISSIONS (TREAT TOOLS LIKE SCOPES) - Create a tool catalog: name, description, inputs/outputs, data accessed, side effects. - Allowlist tools per feature. - Put sensitive tools behind additional constraints (human approval, narrower scopes, stricter schemas). 5) OUTPUT CONTRACTS - Prefer structured outputs (JSON) for anything that feeds a downstream system. - Validate outputs against a schema; reject and retry if invalid. - Log invalid-output failures as first-class events. 6) RETRIEVAL CONTROLS (IF YOU USE RAG) - Define which indexes a feature may access. - Log retrieved document IDs (or stable references) with each response. - Add a “no source, no answer” mode for high-risk features. 7) EVALUATION & RELEASE - Maintain a small golden set of real prompts/tasks (sanitized) per feature. - Run offline evals before release; store results with prompt/model version. - Add online monitoring: flag high-severity signals (policy violations, tool errors, repeated retries). 8) ADMIN & INCIDENT READINESS - Provide at least one admin control: disable feature, restrict to roles, or restrict tools. - Ensure an incident workflow exists: how to find a bad output, reproduce it, roll back prompt version, and export logs. Definition of Done: for any AI output in production, you can answer who triggered it, what context it saw, what tools it used, which prompt/model version produced it, and how to change behavior without a full redeploy.