Model Routing Readiness Checklist (2026)

Goal: replace “pick a model” with a routing layer that survives vendor churn, outages, policy changes, and enterprise security review.

1) Inventory + taxonomy (1–2 hours)
- List every place your product calls an LLM (including background jobs).
- Tag each call with a task type: chat/draft, extraction, classification, summarization, code, multimodal, retrieval/RAG.
- Mark which calls are revenue-critical or user-facing latency-sensitive.

2) Data classes (start with only two)
- Define: (A) OK-to-send-to-hosted and (B) must-stay-in-boundary (PII/regulated/secrets).
- Write down examples for each class so engineers don’t guess.
- Add a simple boundary check in code at the point where prompts are constructed.

3) Routing policy (make it enforceable)
- For each task type, define: primary model/provider, fallback model/provider, and a “degraded mode” behavior.
- Decide what triggers fallback: timeout, 5xx, rate limit, safety refusal, cost cap.
- Add cost and latency budgets per route (qualitative is fine if you’re early; the key is having explicit caps).

4) Governance defaults
- Prompt/output logging: choose full, redacted, or none per data class.
- Secret handling: denylist obvious patterns (API keys, tokens) from logs.
- Tool access: default to read-only; require explicit gating for write actions.

5) Observability you’ll actually use
- Add a trace ID that ties: user request → prompt builder → model call(s) → tool calls → final output.
- Log: provider/model name, latency, token usage if available, fallback events, and refusal/safety outcomes.
- Make one dashboard view for: error rate, latency, and fallback frequency per route.

6) Evaluation gates (lightweight)
- Maintain a small, versioned evaluation set for your top flows (inputs + expected traits).
- Run evals before deploying prompt/template changes.
- Record which model/version produced which behavior so you can bisect regressions.

7) Vendor resilience
- Ensure you can switch providers without touching product code (single internal endpoint/gateway).
- Keep at least one secondary provider configured for critical routes.
- For must-stay-in-boundary data, verify you have a self-hosted/open-weight path (even if lower quality).

8) Security + procurement hygiene
- Document: what data you send, where it goes, and what you store.
- Prepare an answer for enterprise buyers: retention, access controls, and incident response for AI features.

Ship criteria: you can flip a route from Provider A to Provider B in minutes, you can explain what gets logged, and your critical flows keep working (in degraded mode) during an outage.