AI-ERA CHANGE CONTROL TEMPLATE (PROOF-FIRST ENGINEERING) Goal: Make it explicit which changes require which proof before merge and before release. This is not a policy doc for a wiki. Keep it in version control and link it in PR templates. 1) Pick 3 “Never Break” Surfaces List the systems where failure creates immediate customer harm: - Surface A (e.g., authentication/session management): Owner/team: Primary metrics to watch (errors, latency, login success): - Surface B (e.g., billing/invoicing): Owner/team: Primary metrics to watch: - Surface C (e.g., data migrations/backfills): Owner/team: Primary metrics to watch: 2) Define Change Categories (5 max) For each category, specify required proof. - Category name: Examples (paths, services, types of changes): Required proof BEFORE MERGE: Required proof BEFORE RELEASE: Who must approve (CODEOWNERS group): Rollout rule (feature flag/canary/rings): Rollback rule (automatic trigger? manual runbook link?): 3) Required Proof Menu (choose from this list) Use only proofs you can actually enforce. - Automated tests (unit/integration/e2e) linked in CI output - Static analysis/linting (tool + config file location) - Security checks (secret scanning, SAST) with required status checks - Threat model note (short, in PR description) for auth/IAM/token work - Migration plan (dry run + backup/restore check + idempotency note) - Observability updates (dashboard link + alert rule link) - Staged rollout plan (canary/ring steps + success metrics) 4) Enforcement Hooks (make it real) - GitHub branch protection: required checks + required reviews - CODEOWNERS entries for high-risk directories - PR template checkboxes aligned to change categories - CI guardrail job: fail PRs that touch critical paths without required artifacts - Release checklist: a short, repeatable runbook for staged rollout + rollback 5) Post-Incident System Change (mandatory) After any SEV incident, require a concrete change that reduces recurrence: - New/updated test that would have caught it - New/updated alert with a clear threshold or condition - Tightened rollout rule (smaller canary, longer bake time) - Narrowed permissions (IAM/scopes) Assign an owner and a due date, then verify completion in the next ops review. 6) Weekly Leadership Review (30 minutes) Bring only three artifacts: - Escaped defect/incidents list with links to PRs and deploys - Which proofs failed or were skipped (and why) - One proposed policy tightening (or loosening) backed by an incident or near-miss If you do nothing else: enforce CODEOWNERS + required checks for one ‘Never Break’ surface this week, and require staged rollouts for it. Proof beats confidence.