MODEL ROUTER READINESS CHECKLIST (MVP → PRODUCTION) Goal: make your AI product resilient to model/provider churn while improving cost control, reliability, and auditability. 1) Define the interface boundary - Create a single internal function/API for: generate(), embed(), tool_call(). - Ensure application code never imports vendor SDKs directly. - Standardize request metadata: tenant, user, workflow, environment, and a trace_id. 2) Instrumentation you can trust - Log inputs/outputs with strict redaction rules (decide what is never stored). - Add OpenTelemetry spans around model calls and tool calls. - Record model name, provider, retries, latency class, and failure reason. - Keep enough context to reproduce a bad output without leaking secrets. 3) Minimum evaluation gates - Write one task-success eval suite using real, representative examples. - Write one safety/data-handling suite: PII redaction, forbidden tools, allowed domains. - Run evals in CI for prompt changes, tool schema changes, and routing policy changes. - Establish a rollback rule: what metric/failure triggers an automatic revert. 4) Routing policy (start simple) - Pick one routing objective: latency, cost ceiling, groundedness, or tool success. - Hard-code one fallback path for a common incident (timeout, rate limit, refusal). - Add a circuit breaker: if a provider fails repeatedly, stop sending traffic temporarily. 5) Budgets and billing hygiene - Implement per-tenant metering for: requests, tokens/units, and tool calls. - Add spend caps that stop usage (not just alerts) and define the user experience when capped. - Export usage data to your billing/finance system (CSV is fine to start). 6) Data controls and governance - Enforce workspace-level constraints: region, external calls allowed/blocked, logging on/off. - Add role-based permissions for changing prompts, tools, and routing policies. - Maintain an audit trail: who changed what, when, and which evals passed. 7) Production drills - Run a “provider outage day” in staging: disable your primary provider and verify degradation behavior. - Run a “prompt regression” drill: ship a known-bad change and confirm evals block it. - Run a “budget cap” drill: hit the cap and confirm spend stops and customers get a clear message. If you can’t do steps 1–3, don’t pretend you have an AI platform. You have a demo. Steps 4–7 are how you turn it into a product that survives 2026.