MODEL-CONTROL PLANE BUILD SHEET (30-DAY STARTER) Goal: Put ONE production workflow behind a model-control plane with routing, eval gates, policy enforcement, tracing, and budget controls. Don’t boil the ocean. 1) Pick the workflow (Day 1) - Choose a high-traffic or high-risk path (support reply draft, invoice extraction, internal Q&A, etc.). - Write the “definition of done” in one paragraph: what output is acceptable, what must never happen. 2) Create the request contract (Days 1–3) - Define a request schema: user intent, tenant/workspace, data classification (public/internal/PII), allowed tools. - Define output requirements: JSON schema if possible, citations required or not, max length, forbidden content. 3) Put a gateway in front (Days 3–7) - Centralize all model calls behind a single internal endpoint. - Implement timeouts, retries, and explicit fallbacks (model A → model B → “degraded mode”). - Tag every request with: tenant, feature name, version, and cost center. 4) Add tracing and audit logs (Days 5–10) - Log: prompt template version, system instructions, retrieval query, retrieved doc IDs, tool calls + args, final output. - Tie logs to a user action ID so you can replay incidents. - Decide retention and redaction rules (especially for PII/PHI). 5) Build evals that can fail a release (Days 7–18) - Create a small “golden set” of real inputs (sanitized) that represent core cases + edge cases. - Add property checks: “no restricted tool calls,” “no secrets,” “must include citations,” “must be valid JSON,” etc. - Run evals on every prompt/tool change. If it fails, it doesn’t ship. 6) Implement policy enforcement (Days 10–22) - Tool permissioning: least privilege by tenant and by workflow. - Validate tool inputs like an API: strict schemas, allowlists for URLs/domains, length limits. - Separate instruction from data: treat retrieved text as untrusted. 7) Put budgets and throttles in place (Days 14–26) - Set per-tenant rate limits and soft budgets. - Design graceful degradation: smaller model, shorter context, cached answer, or “human review required.” - Add alerts keyed to feature/tenant, not just cloud billing. 8) Run an incident drill (Days 20–30) - Simulate: provider outage, rate limiting, model behavior drift, prompt injection attempt. - Verify: detection (alerts), containment (fallback/degrade), diagnosis (traces), and recovery steps. Deliverables at Day 30: - One workflow fully behind the gateway. - CI eval gate with a golden dataset. - Tracing/audit logs with replay capability. - Enforced tool permissions + input validation. - Tenant budgets + graceful degradation behavior. If you only do one thing: make prompt/tool changes unshippable without evals and traces. That’s the line between “AI demo” and “AI product.”