AI-ERA LEADERSHIP OPERATING SYSTEM (LOS)

Use this as a 30–60–90 day rollout to keep speed and safety aligned when engineers use AI copilots.

PART 1 — WEEKLY SCORECARD (review every Monday, 20 minutes)
1) Delivery
- Deployment frequency (per service / per team): ______
- Lead time for changes (median): ______
2) Quality
- Change failure rate (% deploys causing incident/rollback): ______
- MTTR (median time to restore): ______
3) Review capacity
- PRs opened per engineer/week: ______
- PRs reviewed per engineer/week: ______
- Median PR size (files changed / LOC): ______
4) Cost
- Cloud unit cost (e.g., $ per 1k requests or $ per active user): ______
- AI/inference unit cost (e.g., $ per 1k actions or $ per 1M tokens): ______
5) Security
- Open critical vulnerabilities: ______
- Secrets detected in last 7 days: ______

PART 2 — “DEFINITION OF DONE” (ship checklist)
A change is only “done” when:
- It has a named owner (service owner + on-call)
- It has automated tests or explicit risk acceptance
- It is deployable behind a flag OR has a rollback plan
- Observability exists (dashboard + key alerts)
- A success metric is identified (activation, retention, latency, cost)

PART 3 — DECISION ALTITUDE RULES (reduce expensive mistakes)
Implementation (PR-level):
- Refactors, tests, small features; reviewed via PR template + CI gates.
Architecture (RFC-level):
- Datastore changes, auth/billing flows, data retention, cross-service contracts.
Policy: Any irreversible change must have:
- Threat model link
- Reversibility statement (hours/days/weeks)
- Cost forecast (best case / worst case)

PART 4 — 30–60–90 DAY ROLLOUT
Days 1–30: Visibility
- Establish scorecard baselines.
- Add PR template requiring Outcome, Risk, Evidence.
- Identify top 3 recurring incident types and write runbooks.
Days 31–60: Constraints that create speed
- Enforce CI gates: critical vuln block, secrets detection, required reviews on critical modules.
- Create a lightweight RFC process (1–2 pages max) and a decision log.
Days 61–90: Scale autonomy
- Build paved roads: starter repos, standard logging/metrics, deployment templates.
- Update performance expectations: reward outcomes, reliability, and leverage (not volume).

PART 5 — 1:1 QUESTION SET (manager as debugger)
Ask each week:
- What are you assuming that might be wrong?
- What decision are you stuck on, and who must make it?
- What is the biggest risk (security, reliability, cost, product) in your work?
- What proof will we use to know this worked?

If you do nothing else: make decisions explicit, automate guardrails, and measure outcomes. AI will handle the drafts; leadership must handle the truth.