AI-Native Leadership Operating Template (90-Day Reset)

Purpose
Use this template to move from “AI experiments” to an AI-native operating model in 90 days. The output is a repeatable delivery package: decision rights + evaluation + observability + policy.

1) Pick 2 workflows (Week 1)
Choose workflows with: (a) clear inputs, (b) measurable outcomes, (c) manageable risk.
Examples:
- Support triage + draft responses (metrics: time-to-first-response, containment rate, cost per resolution)
- Engineering maintenance (dependency bumps, test generation, incident summaries) (metrics: change failure rate, MTTR)
Write a one-sentence “definition of done” for each.

2) Assign decision rights (Week 1)
For each workflow, name owners:
- Business Owner (final accountability)
- AI Quality Lead (evals + regression gates)
- Data Access Steward (data boundaries + logging rules)
- Platform Owner (tooling, routing, tracing)
Document what requires human approval (e.g., production writes, customer-facing send, refunds).

3) Write the 1-page workflow spec (Week 2)
Include:
- Intent: what the system should optimize for
- Non-goals: what it must not do
- Allowed tools: APIs, databases, ticket systems
- Data policy: what can be retrieved, stored, or used for training
- Escalation rules: when to hand off to a human

4) Build the evaluation set (Weeks 2–4)
Create a “golden set” of at least 200 cases:
- 60% typical cases
- 30% edge cases
- 10% adversarial/red-team cases
Define pass/fail criteria (accuracy, policy compliance, tone).
Set an initial pass threshold (e.g., 95–97%) and raise it as reliability improves.

5) Instrument observability (Weeks 3–6)
Minimum dashboard:
- Latency p50/p95
- Cost per successful outcome (USD)
- Human override rate (edits, rewrites, reassignments)
- Refusal rate and top refusal reasons
- Policy violations (count and severity)

6) Establish release gates (Weeks 5–8)
Before deployment, require:
- Eval pass rate meets threshold
- Cost per case below max (set a dollar cap, e.g., $0.05–$0.20 depending on workflow)
- Latency below max (e.g., p95 under 2 seconds for interactive workflows)
- Rollback plan and on-call rotation for the workflow

7) Run weekly AI Ops review (Weeks 6–13)
Agenda (30 minutes):
- What changed (model, prompt, tools, data)
- Eval regressions and why
- Override rate movement (target: <20% by week 8)
- Cost movement (target: down 25–50% by day 90)
- Incidents and postmortems (blameless)

8) Success criteria at Day 90
You can answer, with numbers:
- Cost per successful outcome
- Override rate and top reasons
- Eval pass rate and drift over time
- Data sources used and retention policy
- Which actions are automated vs. human-approved

Copy/paste checklist (executive view)
[ ] Two workflows chosen with measurable outcomes
[ ] Owners assigned (Business, Quality, Data, Platform)
[ ] 1-page specs written and approved
[ ] Golden sets built (>=200 cases each)
[ ] Dashboards live (latency, cost, overrides, refusals)
[ ] Release gates enforced (eval thresholds + rollback)
[ ] Weekly AI Ops review running
[ ] Day-90 scorecard published to the company