LLM SUPPLY CHAIN RUNBOOK (1-PAGER TEMPLATE)

Purpose
Define how we operate an LLM dependency as a production supply chain: provider substitution, model upgrades, incident response, and auditability.

1) System Inventory (fill this out)
- Primary provider/model: __________
- Secondary provider/model (or local fallback): __________
- Workloads using LLMs (check): generation / classification / embeddings / reranking / tool-use / batch jobs
- Data classes involved: public / internal / customer content / regulated (specify)

2) Routing & Substitution Rules
- Default routing: (which workload goes to which model)
- Failover trigger conditions (examples): elevated error rate, latency over SLO, refusal spike, cost guardrail breached
- Failover action: automatic (router) or manual (on-call switch)
- Degraded-mode behavior: e.g., shorter outputs, template responses, disable agent tools, route to human review

3) Model Versioning & Release Process
- Version pinning policy: (pinned vs floating)
- Upgrade cadence: (e.g., monthly or quarterly)
- Required artifacts before upgrade:
 - Eval report against golden set
 - Tool-call/schema validation results
 - Safety/refusal regression check
 - Rollback plan and owner

4) Evaluation Gates (minimum viable)
- Golden set location: __________
- Must-pass checks:
 - Output schema validity (if structured)
 - Tool-call correctness (if tools)
 - Grounding/citation rules (if RAG)
 - Known “high-risk” cases (list 5)
- Human review required for: (list workflows where human sign-off is mandatory)

5) Observability & Cost Controls
- Metrics dashboard links: __________
- Alerts:
 - Provider errors/timeouts
 - Latency
 - Refusal rate changes
 - Token/cost budget anomalies
- Tracing requirements: log model ID/version, prompt template version, retrieval doc IDs, tool calls (redact sensitive data)

6) Incident Response
- On-call owner/team: __________
- Decision maker for failover: __________
- Customer comms owner (if needed): __________
- Post-incident checklist:
 - Identify which model/version was involved
 - Capture representative failed prompts/contexts (redacted)
 - Determine whether regression came from model change, prompt change, retrieval change, or traffic shift
 - Add at least one new golden-case test if it would have caught this

7) Governance & Audit
- Data retention policy for prompts/outputs/logs: __________
- Deletion process (prompts, outputs, embeddings): __________
- Access controls: who can view traces and sampled outputs
- Required audit artifacts: data flow diagram, provider terms review date, eval reports for each model upgrade

If this template can’t be completed in one sitting, treat that as a product risk and schedule the work. The goal is not bureaucracy; it’s predictable operations under model volatility.