AI-ERA ENGINEERING LEADERSHIP — REVIEW CAPACITY PLAYBOOK (ONE-WEEK SETUP) Goal: increase code throughput without increasing production risk by making review, verification, and rollout explicit. DAY 1 — MAP THE BLAST RADIUS - List your “critical surfaces”: authn/authz, billing/entitlements, PII access/export, infra/IaC, migrations, admin tools. - For each surface, name a DRI (directly responsible individual) and a backup. - Write a one-paragraph definition of what counts as a change to that surface (so engineers can classify work consistently). DAY 2 — DEFINE 4 RISK TIERS (T0–T3) - T0: low-risk edits (docs/comments/non-prod scripts) - T1: normal product changes (UI, non-critical endpoints, flagged features) - T2: sensitive changes (auth, billing, permissions, PII handling) - T3: systemic changes (migrations, crypto, infra primitives, incident hotfixes under pressure) - For each tier, set minimum requirements: review rules, test expectations, and rollout controls. DAY 3 — MAKE REVIEW A SCHEDULED SYSTEM - Add a review rotation or dedicated review blocks on calendars for maintainers. - Set expectations for PR size: prefer smaller diffs; split refactors from behavior changes. - Adopt CODEOWNERS (or equivalent) for critical paths so the right reviewers are automatically requested. DAY 4 — PUT GUARDRAILS INTO THE REPO - Turn on branch protection in GitHub/GitLab: PR required, status checks required, conversation resolution required. - Require CI checks that matter: unit tests, lint, and at least one security scan appropriate to your stack. - Add pre-merge checks for “sensitive folders” (auth/billing/infra) that enforce maintainer approval. DAY 5 — SHIP SAFER: ROLLOUT + ROLLBACK - Require a rollback plan for T2/T3 changes (even a simple one: revert PR, disable flag, restore backup). - Use feature flags for user-facing behavior changes where possible. - Define post-deploy verification: what metric/log proves the change is healthy? DAY 6 — INCIDENT LOOP (NO BLAME, HIGH SIGNAL) - Use a consistent post-incident template: trigger, timeline, contributing factors, detection gaps, corrective actions. - Track corrective actions as engineering work, not “nice-to-have” tasks. - Look specifically for: missing tests, missing alerts, missing owner review, unclear tiering. DAY 7 — HIRING/LEVELING UPDATE - Update interview rubrics to test review skill and debugging, not just coding speed. - Add a “PR critique” exercise: candidate reviews a diff, identifies risk, asks for tests, suggests rollout/rollback. - Update leveling expectations: seniors are accountable for maintainability and safe change, not output volume. OPERATING RHYTHM (ONGOING) - Weekly: scan merged T2/T3 changes and validate that review + rollout rules were followed. - Monthly: rotate maintainers; prune rules that cause workarounds; strengthen rules that prevent repeat incidents. - Quarterly: re-map critical surfaces; update tier definitions as architecture evolves. If you do only one thing: protect review time and formalize who owns the risky codepaths. AI makes writing faster; leadership has to make verification scalable.