AI CONTROL PLANE LAUNCH CHECKLIST (MVP)

Goal: ship an AI feature that you can control, test, and audit. This is the minimum set of controls that prevents the most expensive failures.

1) INVENTORY & BOUNDARIES
- List every AI entry point (feature name, where it runs, who can access it).
- For each entry point, label the risk tier:
 - Draft: human reviews before external action.
 - Act: model triggers external side effects (emails, writes, refunds, permission changes).
- Write down irreversible actions (anything that can’t be trivially undone). These require stricter controls.

2) STANDARD REQUEST/RESPONSE ENVELOPE
- Every model call logs the same fields: tenant/user, feature, model/provider, prompt version, tools called, retrieval sources, output.
- Ensure logs are searchable by request_id and tenant_id.
- Decide retention and who can access logs (engineering vs support vs security).

3) MODEL ROUTING & FALLBACKS
- Define an approved model list (even if it’s one model today).
- Implement routing per feature (don’t hard-code model choice across the app).
- Add an explicit fallback behavior:
 - backup model OR
 - safe mode response (ask clarifying question, route to human, citations-only).

4) TOOL PERMISSIONS (TREAT TOOLS LIKE SCOPES)
- Create a tool catalog: name, description, inputs/outputs, data accessed, side effects.
- Allowlist tools per feature.
- Put sensitive tools behind additional constraints (human approval, narrower scopes, stricter schemas).

5) OUTPUT CONTRACTS
- Prefer structured outputs (JSON) for anything that feeds a downstream system.
- Validate outputs against a schema; reject and retry if invalid.
- Log invalid-output failures as first-class events.

6) RETRIEVAL CONTROLS (IF YOU USE RAG)
- Define which indexes a feature may access.
- Log retrieved document IDs (or stable references) with each response.
- Add a “no source, no answer” mode for high-risk features.

7) EVALUATION & RELEASE
- Maintain a small golden set of real prompts/tasks (sanitized) per feature.
- Run offline evals before release; store results with prompt/model version.
- Add online monitoring: flag high-severity signals (policy violations, tool errors, repeated retries).

8) ADMIN & INCIDENT READINESS
- Provide at least one admin control: disable feature, restrict to roles, or restrict tools.
- Ensure an incident workflow exists: how to find a bad output, reproduce it, roll back prompt version, and export logs.

Definition of Done: for any AI output in production, you can answer who triggered it, what context it saw, what tools it used, which prompt/model version produced it, and how to change behavior without a full redeploy.