MODEL ROUTING POLICY PACK (2026) Use this template to move from “we call an LLM” to “we operate a governed model fleet.” Copy/paste into your repo and fill the blanks. 1) TASK TAXONOMY (START SMALL) List 5–10 tasks your product performs. Each must map to a user-visible workflow. - Task name: - User value (why it exists): - Interactivity: interactive / background - Data sensitivity: none / internal / regulated - Tooling: none / read-only / write actions 2) ROUTING RULES (POLICY AS CODE) For each task, define the default route and explicit escalation rules. - Default provider + model: - Escalate when (signals): low confidence, user asks for more depth, tool failure, retrieval weak, policy risk - Escalation target model(s): - Hard blocks (never allow): e.g., tool writes without confirmation; sending regulated data to public endpoints 3) DATA HANDLING & REDACTION Write rules that engineers can implement and auditors can understand. - Inputs to detect: PII (email/phone/address), credentials (API keys/tokens), payment data, health data - Redaction behavior: drop, mask, or route-to-private - Allowed destinations by sensitivity tier: - Public: hosted APIs allowed - Internal: private endpoints preferred - Regulated: private endpoint or self-host only - Retention stance: what you store, for how long (if applicable), and where 4) TOOL SAFETY Treat tools like production APIs with guardrails. - Tool inventory (name → read/write): - Allowlist per task: - Confirmation rules for write actions: - Sandboxing approach (test vs prod): - Audit requirements: log tool args, allow/deny, result status 5) LOGGING: THE “AI REQUEST RECORD” Every AI response should have a traceable record. Minimum fields: - request_id, timestamp, user/org id - task name + version - sensitivity flags + redaction summary - provider, model, region (if relevant) - retrieved sources (IDs/URLs), not just text - tool calls (args + outcomes) - policy path taken (which rules fired) - final output + refusal codes (if any) 6) EVAL PLAN (FIRST 2 WEEKS) Don’t boil the ocean. - Pick 2 critical tasks. - Build a “golden set” of ~20–50 examples each (realistic, edge-case heavy). - Define pass/fail checks: - schema validity for structured outputs - citation presence when required - refusal correctness for disallowed requests - tool call correctness for allowed tools - Run evals on any change to: model, system prompt, retrieval settings, tool schema. 7) FAILOVER & INCIDENT BEHAVIOR Decide what the product does under stress. - Provider outage behavior: fallback model? degrade features? queue requests? - Rate limit behavior: retry policy, backoff, user messaging - Quality regression behavior: automatic rollback to previous route - On-call signals: spikes in refusals, tool errors, user reports, latency If you fill this out and implement it faithfully, you’ll be able to answer the enterprise questions that kill deals: where data went, why a model was chosen, what happens on outage, and how you prevent unsafe actions.