AI CONTROL PLANE MVP CHECKLIST (CUSTOMER-AUDITABLE)

Goal: turn an “AI feature” into a customer-governable subsystem with clear controls, logs, and fallbacks.

1) SINGLE GATEWAY (NON-NEGOTIABLE)
- All LLM calls flow through one internal service or module.
- Gateway records: tenant, user, feature/workflow, timestamp, provider, model alias.
- Gateway supports provider abstraction (at least two providers or two model routes, even if one is inactive).

2) TRACE SCHEMA (STRUCTURED, NOT RAW)
Capture these buckets per request:
- Intent: feature name, workflow step, user role, tenant ID.
- Inputs: prompt template ID, retrieval query summary, sources used.
- Execution: provider, model/version alias, tool calls invoked (names only), latency class.
- Outputs: output ID, citations/grounding references if you provide them.
- Controls: policy decisions (allowed/blocked/modified) + policy IDs.

3) CUSTOMER POLICIES (SHIP 3 FIRST)
Policy A: Data boundary
- Allowlist retrieval sources (e.g., only specific indexes, connectors, or collections).
- Redaction rules for sensitive fields before leaving your system.

Policy B: Tool-use restrictions
- Per-tenant toggle for tool calling.
- Per-tool allowlist (e.g., “web browsing off”, “CRM write actions off”).

Policy C: Spend/rate caps
- Quotas per tenant (requests/day or token budget) and alerting on thresholds.
- Hard stop behavior defined (block vs downgrade to cheaper model vs disable feature).

4) CUSTOMER-FACING UI (MINIMUM VIABLE)
- “AI Activity” page: filter by date, user, feature, allowed/blocked.
- Drill-down view shows model/provider, retrieval sources, and policy decisions.
- Export option (CSV/JSON) for audits.

5) KILL SWITCH + DEGRADED MODE
- Feature flag per tenant and per group/role.
- Defined fallback behavior when provider fails (disable AI step, deterministic rules, cached response, or “manual review required”).
- Runbook link visible to internal on-call; customer messaging prepared.

6) RETENTION + PRIVACY DEFAULTS
- Explicit retention settings for traces (default short; extend only by policy).
- Avoid storing raw prompts unless required; prefer template IDs + redacted fields.
- Separate sensitive payload storage from metadata; document access controls.

7) INCIDENT READINESS
- Provider status monitoring (OpenAI/Anthropic/Azure where applicable) wired to your alerting.
- Incident log ties customer-visible impact to provider events.
- Post-incident review template: what policies fired, what routing did, what failed.

Acceptance test: pick one real workflow and prove you can answer, from your UI and exports: which model ran, what data sources were used, what policies applied, what the output ID is, and how to disable it for a single tenant in under a minute.