HUMAN+AGENT OPERATING POLICY (TEMPLATE)

Purpose
Define how humans and AI agents collaborate to ship software safely, quickly, and accountably.

1) Definitions
- “Agent output”: code, text, plans, test cases, incident drafts, or customer responses generated by an AI system.
- “Accountable human”: the named role (not a team) responsible for the final decision and outcome.

2) Decision Rights (RACI Summary)
For each workflow below, fill the Accountable role:
- Merge to main: Accountable = __________ (e.g., Tech Lead)
- Production deploy: Accountable = __________ (e.g., On-call Engineer)
- Schema migration: Accountable = __________ (e.g., Staff Engineer)
- Auth/permissions changes: Accountable = __________ (e.g., Security Eng + TL)
- Billing/pricing logic changes: Accountable = __________ (e.g., Payments TL)
- Prompt/policy changes for LLM features: Accountable = __________ (e.g., ML Lead)
- Incident comms to customers: Accountable = __________ (e.g., Incident Commander)
Rule: Agents may propose/draft, but cannot be Accountable.

3) Risk Tiers (Choose your triggers)
LOW: docs, comments, refactors, UI copy, non-prod scripts.
MEDIUM: API changes behind flags, business logic, performance-sensitive paths.
HIGH: auth, billing, PII/data access, infra, migrations, compliance flows.

4) Required Evidence by Tier
LOW:
- CI must pass (lint + unit tests)
- Brief test plan (1–3 bullets)
MEDIUM:
- CI + integration tests
- Feature flag + rollback steps
- Observability note (metrics/logs to watch)
HIGH:
- Security review or checklist sign-off
- Staging verification + load/perf check if applicable
- Explicit rollback plan + on-call awareness
- If LLM-facing: eval report link + pass threshold (set %)

5) Evaluation Gates for LLM Features
Minimum requirements:
- Golden set: at least 50 representative prompts/inputs
- Adversarial set: at least 20 cases (injection, sensitive data, policy bypass)
- Regression: new release must not reduce pass rate below ____% (suggest 95% for critical flows)
- Monitoring: track refusal rate, hallucination reports, and user satisfaction weekly

6) Cost Controls (FinOps)
- Monthly token budget per team: $_____
- Alert thresholds: 50% / 80% / 100%
- Unit metric: “$ inference cost per shipped feature” or “$ per 1,000 requests”
- Rule: new agents must declare expected cost and owner before production use

7) Security & Compliance Controls
- Access: SSO required; RBAC enforced; least privilege.
- Logging: audit logs enabled for agent actions and data access.
- Data: define retention (e.g., 0/7/30 days); prohibit sending secrets/keys.
- Vendors: require SOC 2 Type II (or plan), DPA, and clarity on training on customer data.

8) Operating Cadence
Weekly:
- Review metrics: lead time, change failure rate, defect escape, review load
Monthly:
- Cost review: budget vs actual; top workflows by spend
Quarterly:
- Revisit RACI, risk tiers, and eval thresholds

Sign-off
Engineering Lead: __________  Date: __________
Security Lead: __________     Date: __________
Product Lead: __________      Date: __________