HUMAN+AGENT OPERATING POLICY (TEMPLATE) Purpose Define how humans and AI agents collaborate to ship software safely, quickly, and accountably. 1) Definitions - “Agent output”: code, text, plans, test cases, incident drafts, or customer responses generated by an AI system. - “Accountable human”: the named role (not a team) responsible for the final decision and outcome. 2) Decision Rights (RACI Summary) For each workflow below, fill the Accountable role: - Merge to main: Accountable = __________ (e.g., Tech Lead) - Production deploy: Accountable = __________ (e.g., On-call Engineer) - Schema migration: Accountable = __________ (e.g., Staff Engineer) - Auth/permissions changes: Accountable = __________ (e.g., Security Eng + TL) - Billing/pricing logic changes: Accountable = __________ (e.g., Payments TL) - Prompt/policy changes for LLM features: Accountable = __________ (e.g., ML Lead) - Incident comms to customers: Accountable = __________ (e.g., Incident Commander) Rule: Agents may propose/draft, but cannot be Accountable. 3) Risk Tiers (Choose your triggers) LOW: docs, comments, refactors, UI copy, non-prod scripts. MEDIUM: API changes behind flags, business logic, performance-sensitive paths. HIGH: auth, billing, PII/data access, infra, migrations, compliance flows. 4) Required Evidence by Tier LOW: - CI must pass (lint + unit tests) - Brief test plan (1–3 bullets) MEDIUM: - CI + integration tests - Feature flag + rollback steps - Observability note (metrics/logs to watch) HIGH: - Security review or checklist sign-off - Staging verification + load/perf check if applicable - Explicit rollback plan + on-call awareness - If LLM-facing: eval report link + pass threshold (set %) 5) Evaluation Gates for LLM Features Minimum requirements: - Golden set: at least 50 representative prompts/inputs - Adversarial set: at least 20 cases (injection, sensitive data, policy bypass) - Regression: new release must not reduce pass rate below ____% (suggest 95% for critical flows) - Monitoring: track refusal rate, hallucination reports, and user satisfaction weekly 6) Cost Controls (FinOps) - Monthly token budget per team: $_____ - Alert thresholds: 50% / 80% / 100% - Unit metric: “$ inference cost per shipped feature” or “$ per 1,000 requests” - Rule: new agents must declare expected cost and owner before production use 7) Security & Compliance Controls - Access: SSO required; RBAC enforced; least privilege. - Logging: audit logs enabled for agent actions and data access. - Data: define retention (e.g., 0/7/30 days); prohibit sending secrets/keys. - Vendors: require SOC 2 Type II (or plan), DPA, and clarity on training on customer data. 8) Operating Cadence Weekly: - Review metrics: lead time, change failure rate, defect escape, review load Monthly: - Cost review: budget vs actual; top workflows by spend Quarterly: - Revisit RACI, risk tiers, and eval thresholds Sign-off Engineering Lead: __________ Date: __________ Security Lead: __________ Date: __________ Product Lead: __________ Date: __________