AI OPERATING SYSTEM — ONE-WORKFLOW ROLLOUT PACK (Template)

GOAL
Ship one AI workflow to production with measurable quality, clear ownership, and a rollback plan.

1) DEFINE THE WORKFLOW (fill this in)
- Workflow name (plain English):
- Primary user:
- Trigger (what starts the workflow):
- Output (what the system produces):
- Human-in-the-loop point (where a human approves/edits):
- “Hard stop” constraints (must never happen):

2) ASSIGN ACCOUNTABILITY (no committees)
- DRI (single person):
- Platform owner (gateway/logging):
- Security owner (policy/threat model):
- Data owner (sources/permissions/freshness):
- On-call/incident owner:

3) DATA & PERMISSIONS CHECK
- Source list (systems the AI can read):
- Permission model (how user access is enforced):
- Redaction rules (PII/secrets):
- Retention policy (how long prompts/outputs are stored):
- Audit log location:

4) EVALUATION HARNESS (must exist before broad rollout)
- Create a fixed eval set from real cases (20–100 is fine to start; don’t guess).
- Define acceptance criteria for this workflow (examples: “must include citations,” “must not invent policy,” “must not expose restricted data”).
- Decide the release gate: what metric(s) must stay stable for a new prompt/model/version to roll out.
- Store eval artifacts in the repo (or a shared system) so results are reproducible.

5) OPERABILITY CHECKLIST
- Central gateway endpoint used? (Yes/No)
- Tracing: request → retrieval → model call → output (Yes/No)
- Budget alerts configured (Yes/No)
- Per-workflow cost attribution (tags/labels) (Yes/No)
- Kill switch / feature flag (Yes/No)

6) SAFETY & INCIDENT RESPONSE
- What counts as an incident for this workflow?
- Immediate actions (kill switch, rollback, disable tool access):
- Who gets paged:
- User communication plan (internal/external):
- Post-incident review owner and format:

7) ROLLOUT PLAN (phased)
- Phase 0: internal dogfood (who, how many, what feedback channel)
- Phase 1: opt-in beta (eligibility criteria)
- Phase 2: default-on (what must be true to switch)
- Monitoring during rollout (dashboards, daily checks, escalation path)

8) MODEL/PROVIDER STRATEGY (keep it practical)
- Approved providers/models for this workflow:
- Routing rule (example: “cheapest model that passes eval gate”):
- Fallback behavior if provider fails (degraded mode, human-only path):

9) WEEKLY OPERATING RHYTHM (30 minutes, no theater)
Agenda:
- Eval regressions since last week:
- Top failure modes (from traces and user feedback):
- Cost anomalies:
- One change to ship next week:

If you can’t fill this pack for a workflow, you’re not ready to scale AI across the org. Start here, make it real, then replicate.