AI-Native Leadership Operating Template (90-Day Reset) Purpose Use this template to move from “AI experiments” to an AI-native operating model in 90 days. The output is a repeatable delivery package: decision rights + evaluation + observability + policy. 1) Pick 2 workflows (Week 1) Choose workflows with: (a) clear inputs, (b) measurable outcomes, (c) manageable risk. Examples: - Support triage + draft responses (metrics: time-to-first-response, containment rate, cost per resolution) - Engineering maintenance (dependency bumps, test generation, incident summaries) (metrics: change failure rate, MTTR) Write a one-sentence “definition of done” for each. 2) Assign decision rights (Week 1) For each workflow, name owners: - Business Owner (final accountability) - AI Quality Lead (evals + regression gates) - Data Access Steward (data boundaries + logging rules) - Platform Owner (tooling, routing, tracing) Document what requires human approval (e.g., production writes, customer-facing send, refunds). 3) Write the 1-page workflow spec (Week 2) Include: - Intent: what the system should optimize for - Non-goals: what it must not do - Allowed tools: APIs, databases, ticket systems - Data policy: what can be retrieved, stored, or used for training - Escalation rules: when to hand off to a human 4) Build the evaluation set (Weeks 2–4) Create a “golden set” of at least 200 cases: - 60% typical cases - 30% edge cases - 10% adversarial/red-team cases Define pass/fail criteria (accuracy, policy compliance, tone). Set an initial pass threshold (e.g., 95–97%) and raise it as reliability improves. 5) Instrument observability (Weeks 3–6) Minimum dashboard: - Latency p50/p95 - Cost per successful outcome (USD) - Human override rate (edits, rewrites, reassignments) - Refusal rate and top refusal reasons - Policy violations (count and severity) 6) Establish release gates (Weeks 5–8) Before deployment, require: - Eval pass rate meets threshold - Cost per case below max (set a dollar cap, e.g., $0.05–$0.20 depending on workflow) - Latency below max (e.g., p95 under 2 seconds for interactive workflows) - Rollback plan and on-call rotation for the workflow 7) Run weekly AI Ops review (Weeks 6–13) Agenda (30 minutes): - What changed (model, prompt, tools, data) - Eval regressions and why - Override rate movement (target: <20% by week 8) - Cost movement (target: down 25–50% by day 90) - Incidents and postmortems (blameless) 8) Success criteria at Day 90 You can answer, with numbers: - Cost per successful outcome - Override rate and top reasons - Eval pass rate and drift over time - Data sources used and retention policy - Which actions are automated vs. human-approved Copy/paste checklist (executive view) [ ] Two workflows chosen with measurable outcomes [ ] Owners assigned (Business, Quality, Data, Platform) [ ] 1-page specs written and approved [ ] Golden sets built (>=200 cases each) [ ] Dashboards live (latency, cost, overrides, refusals) [ ] Release gates enforced (eval thresholds + rollback) [ ] Weekly AI Ops review running [ ] Day-90 scorecard published to the company