Model-Agnostic LLM Feature Spec (Template)

Goal
- Feature name:
- User problem it solves (single sentence):
- What the system must do (observable behaviors):
- What the system must never do (explicit prohibitions):

1) Inputs and Trust Boundaries
- User inputs accepted:
- Auth context available (role, tenant, scopes):
- Data sources allowed (systems of record):
- Data sources forbidden:
- PII policy (redaction rules, retention):

2) Retrieval Design (if any)
- Corpus list (by source):
- Access control rule (how tenant boundaries are enforced):
- Index strategy (keyword / vector / hybrid):
- Chunking notes (what a “chunk” represents):
- Citation requirement: what claims need citations; what doesn’t
- Logging: top-k doc IDs, filters, query text, reranker decision

3) Routing Design
- Intent classes (3–8 max):
- Per-intent risk level (low/med/high):
- Per-intent model choice (fast vs reasoning vs restricted):
- Escalation rules (what triggers human review):
- Refusal rules (what triggers a hard stop):

4) Tooling and Action Gates
- Tool allowlist (per intent):
- Tool schemas (typed args; required/optional fields):
- Preconditions (what must be true before executing a tool):
- Confirmations (when the user must approve):
- Idempotency plan (how repeats are handled):
- Audit log requirements (who/what/when + inputs/outputs)

5) Output Contract
- Output format (JSON / markdown / UI fields):
- Must-include fields:
- Forbidden content:
- Style constraints (only if measurable):
- Post-generation validators (schema, length, citations):
- Failure behavior (retry? fallback model? ask clarifying question?)

6) Evaluation Gate (ship/no-ship)
- Gold set sources (real tickets, docs, queries):
- Test categories:
 - retrieval miss
 - wrong-tenant retrieval
 - tool-call schema error
 - policy violation
 - unsafe action suggestion
- Regression triggers (what changes require rerun):
- Production monitoring (what to alert on):

7) Rollout and Rollback
- Feature flags:
- Canary scope:
- Kill switch conditions:
- Rollback steps (model swap, prompt swap, retriever swap):

Final Check
- If the base model changes tomorrow, what behavior remains guaranteed by the system (contracts, retrieval, gates)? Write it explicitly here: