Production LLM Router Spec (PRD + Checklist) Goal Build a routing layer that makes multi-model LLM features predictable, auditable, and cost-controlled. The router is responsible for model selection, policy enforcement, validation, and traceability. Scope (what the router must own) 1) Request classification - Detect: sensitivity (PII/regulatory), workflow risk (read vs write actions), output format requirements (free text vs JSON), latency class (interactive vs batch). - Store classification result as a first-class field on the trace. 2) Routing policy - Route based on: sensitivity, required modality, required structured output, customer deployment constraints (region/data retention), user tier, and cost ceilings. - Define explicit fallbacks per route (don’t rely on implicit retries). 3) Output constraints + validation - Support JSON schema validation for structured outputs. - Enforce tool permissions: allowlist tools per route; validate tool arguments; cap tool-call depth. - Define refusal behavior for blocked content and for policy violations. 4) Observability + audit Capture per request: - Provider, model identifier, and parameters (temperature/top_p/tool mode). - Prompt template ID and a redacted rendered prompt. - Retrieval context references (doc IDs/chunk hashes) and embedding/search parameters. - Tool calls and tool outputs (status codes, latency, errors). - Policy gate results (PII detected, jailbreak signals, allow/deny decision). - Cost attribution fields (per-tenant, per-route) and caching decisions. 5) Evals + regression - Maintain a small eval set per route (golden prompts + expected properties). - On model changes: run evals automatically; block rollout on failures. - Track failures by route and model version (not just “overall quality”). Rollout plan (operator-friendly) 1) Start with one high-risk workflow (sensitive data or write actions). 2) Implement “deny by default” tool permissions; require explicit allows. 3) Add schema validation if any downstream code depends on structure. 4) Add fallbacks: timeout -> alternate model; schema fail -> stricter prompt -> human review queue. 5) Turn on trace logging, then do an incident rehearsal: pick a bad output and verify you can reconstruct why it happened. Go-live checklist - [ ] Each route has an explicit owner and an eval set. - [ ] Logs are accessible to engineering + security (with redaction). - [ ] Data retention and region behavior are documented per provider/deployment. - [ ] Hard caps exist for context size and tool-call depth. - [ ] Fallback behavior is deterministic and tested. - [ ] A kill switch exists per route and per provider. Definition of Done Operators can answer: what happened, why it happened, what it cost, and how to prevent it—using the router’s artifacts, not guesswork.