MODEL-MEDIATED WORK: LEADERSHIP CONTROL SHEET (TEMPLATE) Purpose Use this to inventory every place LLMs influence decisions (product + internal), then put those surfaces under change control. If an output can ship, send, approve, merge, deny, or commit money: it belongs here. 1) Workflow inventory (fill one row per workflow) - Workflow name: - Function (Eng / Support / Sales / Finance / HR / Product / Security): - Tool(s) involved (e.g., GitHub Copilot, Cursor, Intercom, Zendesk, Microsoft 365 Copilot, OpenAI API, AWS Bedrock): - What the model does (drafts text, writes code, routes tickets, summarizes calls, approves actions): - Where the output goes (customer email, PR, production config, contract, refund): 2) Risk classification (choose one) - Low: internal drafts only; no external send; no privileged actions. - Medium: influences customer comms or product decisions but has human review. - High: can trigger money movement, account restrictions, security-sensitive changes, or production-impacting actions. 3) Minimum controls (non-negotiables) - Owner (single accountable person): - Model/provider and routing config documented (yes/no): - Prompt/system instructions versioned (repo or equivalent) (yes/no): - Prompt changes require review (who approves?): - Data access scope defined (what sources can it retrieve? what permissions?): - Logging policy (what is retained? redaction rules?): - Evaluation: what test set or checks run before changes ship? - Human gate: which actions require explicit approval? 4) Incident plan (write this before you need it) - Kill switch: what happens when you disable the model? (fallback templates, deterministic rules, manual process) - Rollback: how do you revert prompts/routing/index version? - Detection: what signals indicate failure? (customer reports, eval regression, anomaly in outputs) - Comms owner: who speaks to customers if the assistant sent a bad answer? 5) Monthly review prompts (15 minutes per workflow) - Did any prompt/routing/data-source changes ship without review? - Did variance increase? (inconsistent answers, inconsistent code patterns, policy drift) - Are logs sufficient to reproduce a disputed output? - Are evaluations still aligned with real tasks, or testing the wrong thing? How to use this template this week 1) Map 10 workflows in 60 minutes. Don’t argue; list them. 2) Pick one HIGH workflow and implement: versioned prompts + mandatory review + rollback + basic eval. 3) Repeat monthly until every MED/HIGH workflow has an owner and a kill switch.