AI VERIFICATION PLAYBOOK (ONE-PAGE) Purpose Use this to define how your team turns AI-generated drafts (code, text, configs, analyses) into verified output you can ship. 1) Pick the workflow (be specific) - Workflow name: _______________________________ - Where it runs: (repo/service/tool) _______________________________ - What AI produces: (code, YAML, customer email, SQL, policy text) _______________________________ - Who is affected if it’s wrong: (customers, finance, security, uptime) _______________________________ 2) Default risk posture (choose one) A) Low risk (drafting only): wrong output is annoying, reversible, low blast radius. B) Medium risk (guardrailed): wrong output can mislead users or waste time. C) High risk (strict): wrong output can cause outages, security exposure, legal/compliance issues. Selected posture: A / B / C 3) Proof requirements (what must exist before approval) Citations / provenance - For factual claims: link to source-of-truth (doc section, ticket, dashboard, contract). - For configs: include references (module, cluster, account/project ID) and a change rationale. Testing / validation - Minimum checks required: _______________________________ Examples: unit tests, integration tests, contract tests, terraform plan, kubectl dry-run, SQL sample validation query. - “Red flag” cases that must be tested: _______________________________ Examples: authn/authz, billing, data deletion, migrations, rate limiting. Human gate - Required approver(s): _______________________________ - Ownership rule: interface owners approve schema/API changes. - Review rule: reviewers verify invariants and rollback plan, not just diff aesthetics. Rollout / rollback - Rollout method: (feature flag / canary / staged / manual) _______________________________ - Rollback plan written? Y/N - Monitoring note: what metric/log proves success? _______________________________ 4) Refusal rules (what AI must not do) - Never include secrets (API keys, tokens, credentials) in prompts or outputs. - Never claim a control exists (security/compliance) without linked evidence. - Never send customer-facing messages without a human send action (unless explicitly approved for this workflow). - Never run destructive operations by default (DROP, DELETE without WHERE, prod migrations) without explicit confirmation. 5) Evals (your repeatable test set) Create a small fixed set of “hard cases”: - Case 1: _______________________________ - Case 2: _______________________________ - Case 3: _______________________________ For each case, define “pass” in plain language (and what source must be cited). 6) Incident loop (what happens when it fails) - Failure is logged as: (ticket type / label) _______________________________ - Post-incident requirement: add a guardrail (test, policy check, eval case, rollout change). - Owner for guardrail implementation: _______________________________ Sign-off - Workflow owner: __________________ Date: __________ - Security/Compliance (if needed): __________________ Date: __________ - Engineering/Product lead: __________________ Date: __________