Enterprise LLM Build Spec (Retrieval + Tools + Evals) Use this as a scoping document for one AI workflow. Keep it strict. If you can’t fill a line, that’s a risk you’re carrying. 1) Workflow definition - User persona(s): (e.g., L2 support, SRE on-call, AE) - Job to be done: (one sentence) - Allowed outcomes: (what “done” means) - Disallowed outcomes: (what must never happen) 2) System-of-record map (truth sources) For each source, list: - System: (Confluence, SharePoint, GitHub, Jira, Salesforce, BigQuery, etc.) - Data type: (policies, runbooks, tickets, customer records) - Freshness requirement: (how quickly changes must be reflected) - Access model: (groups, roles, object-level ACLs) - Retention/deletion requirements: (who can delete; how fast it disappears) 3) Retrieval spec (RAG) - Index unit: (document/page/section) - Chunking approach: (rule-based; include headings; max size) - Metadata: (source URL, owner, timestamp, ACL tags) - Permissioning: (how ACL trimming is enforced at query time) - Citation requirement: (when citations are mandatory) - “No answer” behavior: (what the assistant does when retrieval is weak) 4) Tooling spec (actions) For each tool: - Tool name + JSON schema (inputs/outputs) - AuthZ model: (who can call it; scoped tokens) - Side effects: (what it changes) - Confirmation: (when human approval is required) - Idempotency: (how retries won’t double-apply) - Rate limits + timeouts: (engineering defaults) 5) Guardrails & policy - Sensitive data rules: (PII, secrets, customer data) - Restricted topics: (legal/HR/finance boundaries) - Escalation path: (handoff to human; ticket creation) - Redaction strategy: (logs, traces, stored prompts) 6) Observability & audit - What is logged: (prompt, retrieved docs IDs, tool calls, outputs) - Where logs live: (system + retention) - Audit trail: (who asked, what sources were accessed) - Incident playbook: (how to disable tools / rollback) 7) Evaluation gates (ship criteria) Create a “golden set” of real scenarios. - Retrieval checks: correct sources retrieved; citations present where required - Tool checks: schema-valid calls; safe failure behavior - Policy checks: refusal/escalation where appropriate - Regression gate: run before every prompt/retriever/model change 8) Model portability plan - Model interface: (single abstraction point) - Prompt/versioning: (how changes are tracked) - Fallback model: (what happens on outage/limits) - Provider exit plan: (what breaks if you swap models) If you do only one thing: define the truth sources and the allowed actions. Everything else becomes much easier—and fine-tuning becomes optional instead of a crutch.