AI CONTROL PLANE MVP CHECKLIST (CUSTOMER-AUDITABLE) Goal: turn an “AI feature” into a customer-governable subsystem with clear controls, logs, and fallbacks. 1) SINGLE GATEWAY (NON-NEGOTIABLE) - All LLM calls flow through one internal service or module. - Gateway records: tenant, user, feature/workflow, timestamp, provider, model alias. - Gateway supports provider abstraction (at least two providers or two model routes, even if one is inactive). 2) TRACE SCHEMA (STRUCTURED, NOT RAW) Capture these buckets per request: - Intent: feature name, workflow step, user role, tenant ID. - Inputs: prompt template ID, retrieval query summary, sources used. - Execution: provider, model/version alias, tool calls invoked (names only), latency class. - Outputs: output ID, citations/grounding references if you provide them. - Controls: policy decisions (allowed/blocked/modified) + policy IDs. 3) CUSTOMER POLICIES (SHIP 3 FIRST) Policy A: Data boundary - Allowlist retrieval sources (e.g., only specific indexes, connectors, or collections). - Redaction rules for sensitive fields before leaving your system. Policy B: Tool-use restrictions - Per-tenant toggle for tool calling. - Per-tool allowlist (e.g., “web browsing off”, “CRM write actions off”). Policy C: Spend/rate caps - Quotas per tenant (requests/day or token budget) and alerting on thresholds. - Hard stop behavior defined (block vs downgrade to cheaper model vs disable feature). 4) CUSTOMER-FACING UI (MINIMUM VIABLE) - “AI Activity” page: filter by date, user, feature, allowed/blocked. - Drill-down view shows model/provider, retrieval sources, and policy decisions. - Export option (CSV/JSON) for audits. 5) KILL SWITCH + DEGRADED MODE - Feature flag per tenant and per group/role. - Defined fallback behavior when provider fails (disable AI step, deterministic rules, cached response, or “manual review required”). - Runbook link visible to internal on-call; customer messaging prepared. 6) RETENTION + PRIVACY DEFAULTS - Explicit retention settings for traces (default short; extend only by policy). - Avoid storing raw prompts unless required; prefer template IDs + redacted fields. - Separate sensitive payload storage from metadata; document access controls. 7) INCIDENT READINESS - Provider status monitoring (OpenAI/Anthropic/Azure where applicable) wired to your alerting. - Incident log ties customer-visible impact to provider events. - Post-incident review template: what policies fired, what routing did, what failed. Acceptance test: pick one real workflow and prove you can answer, from your UI and exports: which model ran, what data sources were used, what policies applied, what the output ID is, and how to disable it for a single tenant in under a minute.