MODEL ROUTING POLICY PACK (2026)

Use this template to move from “we call an LLM” to “we operate a governed model fleet.” Copy/paste into your repo and fill the blanks.

1) TASK TAXONOMY (START SMALL)
List 5–10 tasks your product performs. Each must map to a user-visible workflow.
- Task name:
- User value (why it exists):
- Interactivity: interactive / background
- Data sensitivity: none / internal / regulated
- Tooling: none / read-only / write actions

2) ROUTING RULES (POLICY AS CODE)
For each task, define the default route and explicit escalation rules.
- Default provider + model:
- Escalate when (signals): low confidence, user asks for more depth, tool failure, retrieval weak, policy risk
- Escalation target model(s):
- Hard blocks (never allow): e.g., tool writes without confirmation; sending regulated data to public endpoints

3) DATA HANDLING & REDACTION
Write rules that engineers can implement and auditors can understand.
- Inputs to detect: PII (email/phone/address), credentials (API keys/tokens), payment data, health data
- Redaction behavior: drop, mask, or route-to-private
- Allowed destinations by sensitivity tier:
 - Public: hosted APIs allowed
 - Internal: private endpoints preferred
 - Regulated: private endpoint or self-host only
- Retention stance: what you store, for how long (if applicable), and where

4) TOOL SAFETY
Treat tools like production APIs with guardrails.
- Tool inventory (name → read/write):
- Allowlist per task:
- Confirmation rules for write actions:
- Sandboxing approach (test vs prod):
- Audit requirements: log tool args, allow/deny, result status

5) LOGGING: THE “AI REQUEST RECORD”
Every AI response should have a traceable record.
Minimum fields:
- request_id, timestamp, user/org id
- task name + version
- sensitivity flags + redaction summary
- provider, model, region (if relevant)
- retrieved sources (IDs/URLs), not just text
- tool calls (args + outcomes)
- policy path taken (which rules fired)
- final output + refusal codes (if any)

6) EVAL PLAN (FIRST 2 WEEKS)
Don’t boil the ocean.
- Pick 2 critical tasks.
- Build a “golden set” of ~20–50 examples each (realistic, edge-case heavy).
- Define pass/fail checks:
 - schema validity for structured outputs
 - citation presence when required
 - refusal correctness for disallowed requests
 - tool call correctness for allowed tools
- Run evals on any change to: model, system prompt, retrieval settings, tool schema.

7) FAILOVER & INCIDENT BEHAVIOR
Decide what the product does under stress.
- Provider outage behavior: fallback model? degrade features? queue requests?
- Rate limit behavior: retry policy, backoff, user messaging
- Quality regression behavior: automatic rollback to previous route
- On-call signals: spikes in refusals, tool errors, user reports, latency

If you fill this out and implement it faithfully, you’ll be able to answer the enterprise questions that kill deals: where data went, why a model was chosen, what happens on outage, and how you prevent unsafe actions.