MODEL-MEDIATED WORK: LEADERSHIP CONTROL SHEET (TEMPLATE)

Purpose
Use this to inventory every place LLMs influence decisions (product + internal), then put those surfaces under change control. If an output can ship, send, approve, merge, deny, or commit money: it belongs here.

1) Workflow inventory (fill one row per workflow)
- Workflow name:
- Function (Eng / Support / Sales / Finance / HR / Product / Security):
- Tool(s) involved (e.g., GitHub Copilot, Cursor, Intercom, Zendesk, Microsoft 365 Copilot, OpenAI API, AWS Bedrock):
- What the model does (drafts text, writes code, routes tickets, summarizes calls, approves actions):
- Where the output goes (customer email, PR, production config, contract, refund):

2) Risk classification (choose one)
- Low: internal drafts only; no external send; no privileged actions.
- Medium: influences customer comms or product decisions but has human review.
- High: can trigger money movement, account restrictions, security-sensitive changes, or production-impacting actions.

3) Minimum controls (non-negotiables)
- Owner (single accountable person):
- Model/provider and routing config documented (yes/no):
- Prompt/system instructions versioned (repo or equivalent) (yes/no):
- Prompt changes require review (who approves?):
- Data access scope defined (what sources can it retrieve? what permissions?):
- Logging policy (what is retained? redaction rules?):
- Evaluation: what test set or checks run before changes ship?
- Human gate: which actions require explicit approval?

4) Incident plan (write this before you need it)
- Kill switch: what happens when you disable the model? (fallback templates, deterministic rules, manual process)
- Rollback: how do you revert prompts/routing/index version?
- Detection: what signals indicate failure? (customer reports, eval regression, anomaly in outputs)
- Comms owner: who speaks to customers if the assistant sent a bad answer?

5) Monthly review prompts (15 minutes per workflow)
- Did any prompt/routing/data-source changes ship without review?
- Did variance increase? (inconsistent answers, inconsistent code patterns, policy drift)
- Are logs sufficient to reproduce a disputed output?
- Are evaluations still aligned with real tasks, or testing the wrong thing?

How to use this template this week
1) Map 10 workflows in 60 minutes. Don’t argue; list them.
2) Pick one HIGH workflow and implement: versioned prompts + mandatory review + rollback + basic eval.
3) Repeat monthly until every MED/HIGH workflow has an owner and a kill switch.