HUMAN + AGENT OPERATING SYSTEM (HAOS) — LEADERSHIP CHECKLIST

Use this checklist to roll out agent workflows safely and measurably in ~90 days.

1) PICK THE RIGHT WORKFLOWS (WEEK 1)
- Choose 2 workflows with measurable outcomes (examples: draft RFCs, triage support tickets, generate test cases, summarize incidents).
- Define a baseline metric and a target (e.g., reduce median time-to-first-response from 6h to 4h; reduce RFC draft time from 5 days to 2 days).
- Write “red lines” (must-not-do rules): actions the agent cannot take under any circumstances.

2) DEFINE AUTONOMY LEVELS (WEEK 1–2)
- Copilot-only: agent suggests, human executes.
- Agent-as-intern: agent drafts/opens PRs, human approves.
- Scoped autonomy: agent executes within tight thresholds (e.g., refunds < $25).
- Human-only: legal commitments, large financial changes, raw PII access, security-critical merges.

3) INSTRUMENTATION + PROVENANCE (WEEK 2–4)
- Log every run: model used, prompt/version, tools called, data sources retrieved, outputs.
- Store logs centrally with retention (suggested: 90–180 days) and search.
- Add a “why” field to workflows: what objective the agent is optimizing for.

4) PERMISSIONS + DATA ACCESS (WEEK 2–6)
- Default agents to read-only; require explicit justification for write access.
- Implement tiered approvals: e.g., refunds > $100 require manager approval.
- Create a break-glass process for elevated access with audit trails.
- Redact sensitive fields in retrieval (tokens, secrets, raw payment details, SSNs).

5) EVALS + QUALITY GATES (WEEK 3–8)
- Build a regression set (start with 100–500 real historical examples).
- Define pass/fail thresholds (e.g., 95% pass required to ship a prompt/model change).
- Track override rate: how often humans must correct the agent. Rising overrides = drift.

6) METRICS THAT MATTER (ONGOING)
- Outcome metrics: customer CSAT by cohort, defect escape rate, rollback rate, conversion rates.
- Flow metrics: lead time to production, time-to-review, percent of outputs reviewed.
- Cost metrics: model/tool spend per team; rework cost (incidents, escalations, refunds).

7) CHANGE MANAGEMENT (ONGOING)
- Standardize templates: spec-to-eval, postmortems, decision memos.
- Publish “paved roads”: approved connectors, approved actions, approved datasets.
- Train managers on reviewing agent output (what to verify, what to sample, what to reject).

90-DAY SUCCESS CRITERIA
- At least one workflow moved from “drafts only” to “scoped autonomy” with no major incident.
- Measurable improvement (>= 20%) in a primary metric without a regression in quality metrics.
- Clear ownership: named steward, eval owner, and data gatekeeper for each workflow.