AI-ERA ENGINEERING LEADERSHIP — REVIEW CAPACITY PLAYBOOK (ONE-WEEK SETUP)

Goal: increase code throughput without increasing production risk by making review, verification, and rollout explicit.

DAY 1 — MAP THE BLAST RADIUS
- List your “critical surfaces”: authn/authz, billing/entitlements, PII access/export, infra/IaC, migrations, admin tools.
- For each surface, name a DRI (directly responsible individual) and a backup.
- Write a one-paragraph definition of what counts as a change to that surface (so engineers can classify work consistently).

DAY 2 — DEFINE 4 RISK TIERS (T0–T3)
- T0: low-risk edits (docs/comments/non-prod scripts)
- T1: normal product changes (UI, non-critical endpoints, flagged features)
- T2: sensitive changes (auth, billing, permissions, PII handling)
- T3: systemic changes (migrations, crypto, infra primitives, incident hotfixes under pressure)
- For each tier, set minimum requirements: review rules, test expectations, and rollout controls.

DAY 3 — MAKE REVIEW A SCHEDULED SYSTEM
- Add a review rotation or dedicated review blocks on calendars for maintainers.
- Set expectations for PR size: prefer smaller diffs; split refactors from behavior changes.
- Adopt CODEOWNERS (or equivalent) for critical paths so the right reviewers are automatically requested.

DAY 4 — PUT GUARDRAILS INTO THE REPO
- Turn on branch protection in GitHub/GitLab: PR required, status checks required, conversation resolution required.
- Require CI checks that matter: unit tests, lint, and at least one security scan appropriate to your stack.
- Add pre-merge checks for “sensitive folders” (auth/billing/infra) that enforce maintainer approval.

DAY 5 — SHIP SAFER: ROLLOUT + ROLLBACK
- Require a rollback plan for T2/T3 changes (even a simple one: revert PR, disable flag, restore backup).
- Use feature flags for user-facing behavior changes where possible.
- Define post-deploy verification: what metric/log proves the change is healthy?

DAY 6 — INCIDENT LOOP (NO BLAME, HIGH SIGNAL)
- Use a consistent post-incident template: trigger, timeline, contributing factors, detection gaps, corrective actions.
- Track corrective actions as engineering work, not “nice-to-have” tasks.
- Look specifically for: missing tests, missing alerts, missing owner review, unclear tiering.

DAY 7 — HIRING/LEVELING UPDATE
- Update interview rubrics to test review skill and debugging, not just coding speed.
- Add a “PR critique” exercise: candidate reviews a diff, identifies risk, asks for tests, suggests rollout/rollback.
- Update leveling expectations: seniors are accountable for maintainability and safe change, not output volume.

OPERATING RHYTHM (ONGOING)
- Weekly: scan merged T2/T3 changes and validate that review + rollout rules were followed.
- Monthly: rotate maintainers; prune rules that cause workarounds; strengthen rules that prevent repeat incidents.
- Quarterly: re-map critical surfaces; update tier definitions as architecture evolves.

If you do only one thing: protect review time and formalize who owns the risky codepaths. AI makes writing faster; leadership has to make verification scalable.