AI-ERA LEADERSHIP OPERATING SYSTEM (LOS) Use this as a 30–60–90 day rollout to keep speed and safety aligned when engineers use AI copilots. PART 1 — WEEKLY SCORECARD (review every Monday, 20 minutes) 1) Delivery - Deployment frequency (per service / per team): ______ - Lead time for changes (median): ______ 2) Quality - Change failure rate (% deploys causing incident/rollback): ______ - MTTR (median time to restore): ______ 3) Review capacity - PRs opened per engineer/week: ______ - PRs reviewed per engineer/week: ______ - Median PR size (files changed / LOC): ______ 4) Cost - Cloud unit cost (e.g., $ per 1k requests or $ per active user): ______ - AI/inference unit cost (e.g., $ per 1k actions or $ per 1M tokens): ______ 5) Security - Open critical vulnerabilities: ______ - Secrets detected in last 7 days: ______ PART 2 — “DEFINITION OF DONE” (ship checklist) A change is only “done” when: - It has a named owner (service owner + on-call) - It has automated tests or explicit risk acceptance - It is deployable behind a flag OR has a rollback plan - Observability exists (dashboard + key alerts) - A success metric is identified (activation, retention, latency, cost) PART 3 — DECISION ALTITUDE RULES (reduce expensive mistakes) Implementation (PR-level): - Refactors, tests, small features; reviewed via PR template + CI gates. Architecture (RFC-level): - Datastore changes, auth/billing flows, data retention, cross-service contracts. Policy: Any irreversible change must have: - Threat model link - Reversibility statement (hours/days/weeks) - Cost forecast (best case / worst case) PART 4 — 30–60–90 DAY ROLLOUT Days 1–30: Visibility - Establish scorecard baselines. - Add PR template requiring Outcome, Risk, Evidence. - Identify top 3 recurring incident types and write runbooks. Days 31–60: Constraints that create speed - Enforce CI gates: critical vuln block, secrets detection, required reviews on critical modules. - Create a lightweight RFC process (1–2 pages max) and a decision log. Days 61–90: Scale autonomy - Build paved roads: starter repos, standard logging/metrics, deployment templates. - Update performance expectations: reward outcomes, reliability, and leverage (not volume). PART 5 — 1:1 QUESTION SET (manager as debugger) Ask each week: - What are you assuming that might be wrong? - What decision are you stuck on, and who must make it? - What is the biggest risk (security, reliability, cost, product) in your work? - What proof will we use to know this worked? If you do nothing else: make decisions explicit, automate guardrails, and measure outcomes. AI will handle the drafts; leadership must handle the truth.