MODEL ROUTER READINESS CHECKLIST (MVP → PRODUCTION)

Goal: make your AI product resilient to model/provider churn while improving cost control, reliability, and auditability.

1) Define the interface boundary
- Create a single internal function/API for: generate(), embed(), tool_call().
- Ensure application code never imports vendor SDKs directly.
- Standardize request metadata: tenant, user, workflow, environment, and a trace_id.

2) Instrumentation you can trust
- Log inputs/outputs with strict redaction rules (decide what is never stored).
- Add OpenTelemetry spans around model calls and tool calls.
- Record model name, provider, retries, latency class, and failure reason.
- Keep enough context to reproduce a bad output without leaking secrets.

3) Minimum evaluation gates
- Write one task-success eval suite using real, representative examples.
- Write one safety/data-handling suite: PII redaction, forbidden tools, allowed domains.
- Run evals in CI for prompt changes, tool schema changes, and routing policy changes.
- Establish a rollback rule: what metric/failure triggers an automatic revert.

4) Routing policy (start simple)
- Pick one routing objective: latency, cost ceiling, groundedness, or tool success.
- Hard-code one fallback path for a common incident (timeout, rate limit, refusal).
- Add a circuit breaker: if a provider fails repeatedly, stop sending traffic temporarily.

5) Budgets and billing hygiene
- Implement per-tenant metering for: requests, tokens/units, and tool calls.
- Add spend caps that stop usage (not just alerts) and define the user experience when capped.
- Export usage data to your billing/finance system (CSV is fine to start).

6) Data controls and governance
- Enforce workspace-level constraints: region, external calls allowed/blocked, logging on/off.
- Add role-based permissions for changing prompts, tools, and routing policies.
- Maintain an audit trail: who changed what, when, and which evals passed.

7) Production drills
- Run a “provider outage day” in staging: disable your primary provider and verify degradation behavior.
- Run a “prompt regression” drill: ship a known-bad change and confirm evals block it.
- Run a “budget cap” drill: hit the cap and confirm spend stops and customers get a clear message.

If you can’t do steps 1–3, don’t pretend you have an AI platform. You have a demo. Steps 4–7 are how you turn it into a product that survives 2026.