AI OPERATING SYSTEM — ONE-WORKFLOW ROLLOUT PACK (Template) GOAL Ship one AI workflow to production with measurable quality, clear ownership, and a rollback plan. 1) DEFINE THE WORKFLOW (fill this in) - Workflow name (plain English): - Primary user: - Trigger (what starts the workflow): - Output (what the system produces): - Human-in-the-loop point (where a human approves/edits): - “Hard stop” constraints (must never happen): 2) ASSIGN ACCOUNTABILITY (no committees) - DRI (single person): - Platform owner (gateway/logging): - Security owner (policy/threat model): - Data owner (sources/permissions/freshness): - On-call/incident owner: 3) DATA & PERMISSIONS CHECK - Source list (systems the AI can read): - Permission model (how user access is enforced): - Redaction rules (PII/secrets): - Retention policy (how long prompts/outputs are stored): - Audit log location: 4) EVALUATION HARNESS (must exist before broad rollout) - Create a fixed eval set from real cases (20–100 is fine to start; don’t guess). - Define acceptance criteria for this workflow (examples: “must include citations,” “must not invent policy,” “must not expose restricted data”). - Decide the release gate: what metric(s) must stay stable for a new prompt/model/version to roll out. - Store eval artifacts in the repo (or a shared system) so results are reproducible. 5) OPERABILITY CHECKLIST - Central gateway endpoint used? (Yes/No) - Tracing: request → retrieval → model call → output (Yes/No) - Budget alerts configured (Yes/No) - Per-workflow cost attribution (tags/labels) (Yes/No) - Kill switch / feature flag (Yes/No) 6) SAFETY & INCIDENT RESPONSE - What counts as an incident for this workflow? - Immediate actions (kill switch, rollback, disable tool access): - Who gets paged: - User communication plan (internal/external): - Post-incident review owner and format: 7) ROLLOUT PLAN (phased) - Phase 0: internal dogfood (who, how many, what feedback channel) - Phase 1: opt-in beta (eligibility criteria) - Phase 2: default-on (what must be true to switch) - Monitoring during rollout (dashboards, daily checks, escalation path) 8) MODEL/PROVIDER STRATEGY (keep it practical) - Approved providers/models for this workflow: - Routing rule (example: “cheapest model that passes eval gate”): - Fallback behavior if provider fails (degraded mode, human-only path): 9) WEEKLY OPERATING RHYTHM (30 minutes, no theater) Agenda: - Eval regressions since last week: - Top failure modes (from traces and user feedback): - Cost anomalies: - One change to ship next week: If you can’t fill this pack for a workflow, you’re not ready to scale AI across the org. Start here, make it real, then replicate.