Model-Agnostic LLM Feature Spec (Template) Goal - Feature name: - User problem it solves (single sentence): - What the system must do (observable behaviors): - What the system must never do (explicit prohibitions): 1) Inputs and Trust Boundaries - User inputs accepted: - Auth context available (role, tenant, scopes): - Data sources allowed (systems of record): - Data sources forbidden: - PII policy (redaction rules, retention): 2) Retrieval Design (if any) - Corpus list (by source): - Access control rule (how tenant boundaries are enforced): - Index strategy (keyword / vector / hybrid): - Chunking notes (what a “chunk” represents): - Citation requirement: what claims need citations; what doesn’t - Logging: top-k doc IDs, filters, query text, reranker decision 3) Routing Design - Intent classes (3–8 max): - Per-intent risk level (low/med/high): - Per-intent model choice (fast vs reasoning vs restricted): - Escalation rules (what triggers human review): - Refusal rules (what triggers a hard stop): 4) Tooling and Action Gates - Tool allowlist (per intent): - Tool schemas (typed args; required/optional fields): - Preconditions (what must be true before executing a tool): - Confirmations (when the user must approve): - Idempotency plan (how repeats are handled): - Audit log requirements (who/what/when + inputs/outputs) 5) Output Contract - Output format (JSON / markdown / UI fields): - Must-include fields: - Forbidden content: - Style constraints (only if measurable): - Post-generation validators (schema, length, citations): - Failure behavior (retry? fallback model? ask clarifying question?) 6) Evaluation Gate (ship/no-ship) - Gold set sources (real tickets, docs, queries): - Test categories: - retrieval miss - wrong-tenant retrieval - tool-call schema error - policy violation - unsafe action suggestion - Regression triggers (what changes require rerun): - Production monitoring (what to alert on): 7) Rollout and Rollback - Feature flags: - Canary scope: - Kill switch conditions: - Rollback steps (model swap, prompt swap, retriever swap): Final Check - If the base model changes tomorrow, what behavior remains guaranteed by the system (contracts, retrieval, gates)? Write it explicitly here: