MODEL ROUTING PRD ADDENDUM (COPY/PASTE) Purpose Define how this feature chooses models/tools, what it is allowed to do, how it fails, and how we’ll measure drift. This document is meant to be readable by Product, Engineering, Security/Privacy, and Support. 1) User intents (keep it small) List 3–7 intents this feature supports. For each intent: - Intent name: - User entry points (UI actions, API endpoints): - Example prompts/users tasks (3 examples): - Risk level (low/medium/high) and why: 2) Routing policy per intent For each intent: - Default model tier (small/fast, mid, strong reasoning, planner): - Allowed providers (e.g., OpenAI, Anthropic, Google, open-weight fallback): - Context rules: what sources can be included (docs, tickets, emails) and what is prohibited (PII types, secrets, credentials): - Retrieval rules: required/optional/never; what counts as “sufficient sources”: - Tooling: allowed tools and disallowed tools: 3) Tool contracts (assume the model is wrong) For each tool: - Inputs (types, required fields): - Output schema (what is returned on success): - Error schema (machine-readable categories): - Idempotency strategy (how to avoid double-writes): - Human confirmation requirement (yes/no; what text the user sees): 4) Guardrails and refusal behavior - Refusal triggers (policy categories, missing citations, missing permissions): - What the product shows the user on refusal (copy, next-step options): - Escalation path (handoff to human, ticket creation, or safe-mode): 5) Logging, privacy, retention - What we log (prompts, tool calls, retrieved snippets, outputs) and at what granularity: - Redaction strategy (what is removed before logging): - Retention period and access controls: - Customer controls (opt-out, data residency constraints if applicable): 6) Reliability + fallbacks - Provider outage plan (automatic failover, degraded mode, feature flag): - Rate limit plan (queue, retry, backoff, user messaging): - Max latency budget per intent (qualitative is fine if you can’t commit to a number yet): 7) Quality evaluation plan (no vanity metrics) For each intent: - What “correct” means (examples): - Offline evaluation set source (real tickets/docs, synthetic data policy): - Acceptance criteria (qualitative thresholds): - Red-team scenarios (jailbreak attempts, prompt injection via retrieved text): 8) Observability (questions we must answer in production) - How many requests per intent and per route? - Where are refusals clustered (intent/provider)? - Where do users re-prompt repeatedly (proxy for confusion)? - Tool error rates by tool and by intent: - Drift detection: what changes trigger investigation (behavior shifts, citation drop, unexpected tool usage)? Sign-offs - Product: - Engineering: - Security/Privacy: - Support/Operations: Notes If any section cannot be completed without “we’ll tune prompts later,” treat that as a blocker. Prompts are an implementation detail; routing, tools, and failure behavior are the product.