MODEL ROUTING PRD ADDENDUM (COPY/PASTE)

Purpose
Define how this feature chooses models/tools, what it is allowed to do, how it fails, and how we’ll measure drift. This document is meant to be readable by Product, Engineering, Security/Privacy, and Support.

1) User intents (keep it small)
List 3–7 intents this feature supports. For each intent:
- Intent name:
- User entry points (UI actions, API endpoints):
- Example prompts/users tasks (3 examples):
- Risk level (low/medium/high) and why:

2) Routing policy per intent
For each intent:
- Default model tier (small/fast, mid, strong reasoning, planner):
- Allowed providers (e.g., OpenAI, Anthropic, Google, open-weight fallback):
- Context rules: what sources can be included (docs, tickets, emails) and what is prohibited (PII types, secrets, credentials):
- Retrieval rules: required/optional/never; what counts as “sufficient sources”:
- Tooling: allowed tools and disallowed tools:

3) Tool contracts (assume the model is wrong)
For each tool:
- Inputs (types, required fields):
- Output schema (what is returned on success):
- Error schema (machine-readable categories):
- Idempotency strategy (how to avoid double-writes):
- Human confirmation requirement (yes/no; what text the user sees):

4) Guardrails and refusal behavior
- Refusal triggers (policy categories, missing citations, missing permissions):
- What the product shows the user on refusal (copy, next-step options):
- Escalation path (handoff to human, ticket creation, or safe-mode):

5) Logging, privacy, retention
- What we log (prompts, tool calls, retrieved snippets, outputs) and at what granularity:
- Redaction strategy (what is removed before logging):
- Retention period and access controls:
- Customer controls (opt-out, data residency constraints if applicable):

6) Reliability + fallbacks
- Provider outage plan (automatic failover, degraded mode, feature flag):
- Rate limit plan (queue, retry, backoff, user messaging):
- Max latency budget per intent (qualitative is fine if you can’t commit to a number yet):

7) Quality evaluation plan (no vanity metrics)
For each intent:
- What “correct” means (examples):
- Offline evaluation set source (real tickets/docs, synthetic data policy):
- Acceptance criteria (qualitative thresholds):
- Red-team scenarios (jailbreak attempts, prompt injection via retrieved text):

8) Observability (questions we must answer in production)
- How many requests per intent and per route?
- Where are refusals clustered (intent/provider)?
- Where do users re-prompt repeatedly (proxy for confusion)?
- Tool error rates by tool and by intent:
- Drift detection: what changes trigger investigation (behavior shifts, citation drop, unexpected tool usage)?

Sign-offs
- Product:
- Engineering:
- Security/Privacy:
- Support/Operations:

Notes
If any section cannot be completed without “we’ll tune prompts later,” treat that as a blocker. Prompts are an implementation detail; routing, tools, and failure behavior are the product.