LLM SUPPLY CHAIN RUNBOOK (1-PAGER TEMPLATE) Purpose Define how we operate an LLM dependency as a production supply chain: provider substitution, model upgrades, incident response, and auditability. 1) System Inventory (fill this out) - Primary provider/model: __________ - Secondary provider/model (or local fallback): __________ - Workloads using LLMs (check): generation / classification / embeddings / reranking / tool-use / batch jobs - Data classes involved: public / internal / customer content / regulated (specify) 2) Routing & Substitution Rules - Default routing: (which workload goes to which model) - Failover trigger conditions (examples): elevated error rate, latency over SLO, refusal spike, cost guardrail breached - Failover action: automatic (router) or manual (on-call switch) - Degraded-mode behavior: e.g., shorter outputs, template responses, disable agent tools, route to human review 3) Model Versioning & Release Process - Version pinning policy: (pinned vs floating) - Upgrade cadence: (e.g., monthly or quarterly) - Required artifacts before upgrade: - Eval report against golden set - Tool-call/schema validation results - Safety/refusal regression check - Rollback plan and owner 4) Evaluation Gates (minimum viable) - Golden set location: __________ - Must-pass checks: - Output schema validity (if structured) - Tool-call correctness (if tools) - Grounding/citation rules (if RAG) - Known “high-risk” cases (list 5) - Human review required for: (list workflows where human sign-off is mandatory) 5) Observability & Cost Controls - Metrics dashboard links: __________ - Alerts: - Provider errors/timeouts - Latency - Refusal rate changes - Token/cost budget anomalies - Tracing requirements: log model ID/version, prompt template version, retrieval doc IDs, tool calls (redact sensitive data) 6) Incident Response - On-call owner/team: __________ - Decision maker for failover: __________ - Customer comms owner (if needed): __________ - Post-incident checklist: - Identify which model/version was involved - Capture representative failed prompts/contexts (redacted) - Determine whether regression came from model change, prompt change, retrieval change, or traffic shift - Add at least one new golden-case test if it would have caught this 7) Governance & Audit - Data retention policy for prompts/outputs/logs: __________ - Deletion process (prompts, outputs, embeddings): __________ - Access controls: who can view traces and sampled outputs - Required audit artifacts: data flow diagram, provider terms review date, eval reports for each model upgrade If this template can’t be completed in one sitting, treat that as a product risk and schedule the work. The goal is not bureaucracy; it’s predictable operations under model volatility.