HUMAN + AGENT OPERATING SYSTEM (HAOS) — LEADERSHIP CHECKLIST Use this checklist to roll out agent workflows safely and measurably in ~90 days. 1) PICK THE RIGHT WORKFLOWS (WEEK 1) - Choose 2 workflows with measurable outcomes (examples: draft RFCs, triage support tickets, generate test cases, summarize incidents). - Define a baseline metric and a target (e.g., reduce median time-to-first-response from 6h to 4h; reduce RFC draft time from 5 days to 2 days). - Write “red lines” (must-not-do rules): actions the agent cannot take under any circumstances. 2) DEFINE AUTONOMY LEVELS (WEEK 1–2) - Copilot-only: agent suggests, human executes. - Agent-as-intern: agent drafts/opens PRs, human approves. - Scoped autonomy: agent executes within tight thresholds (e.g., refunds < $25). - Human-only: legal commitments, large financial changes, raw PII access, security-critical merges. 3) INSTRUMENTATION + PROVENANCE (WEEK 2–4) - Log every run: model used, prompt/version, tools called, data sources retrieved, outputs. - Store logs centrally with retention (suggested: 90–180 days) and search. - Add a “why” field to workflows: what objective the agent is optimizing for. 4) PERMISSIONS + DATA ACCESS (WEEK 2–6) - Default agents to read-only; require explicit justification for write access. - Implement tiered approvals: e.g., refunds > $100 require manager approval. - Create a break-glass process for elevated access with audit trails. - Redact sensitive fields in retrieval (tokens, secrets, raw payment details, SSNs). 5) EVALS + QUALITY GATES (WEEK 3–8) - Build a regression set (start with 100–500 real historical examples). - Define pass/fail thresholds (e.g., 95% pass required to ship a prompt/model change). - Track override rate: how often humans must correct the agent. Rising overrides = drift. 6) METRICS THAT MATTER (ONGOING) - Outcome metrics: customer CSAT by cohort, defect escape rate, rollback rate, conversion rates. - Flow metrics: lead time to production, time-to-review, percent of outputs reviewed. - Cost metrics: model/tool spend per team; rework cost (incidents, escalations, refunds). 7) CHANGE MANAGEMENT (ONGOING) - Standardize templates: spec-to-eval, postmortems, decision memos. - Publish “paved roads”: approved connectors, approved actions, approved datasets. - Train managers on reviewing agent output (what to verify, what to sample, what to reject). 90-DAY SUCCESS CRITERIA - At least one workflow moved from “drafts only” to “scoped autonomy” with no major incident. - Measurable improvement (>= 20%) in a primary metric without a regression in quality metrics. - Clear ownership: named steward, eval owner, and data gatekeeper for each workflow.