AGENT RACI + GUARDRAILS TEMPLATE (2026)

Purpose
Use this template to define: (1) which AI agents exist, (2) what they’re allowed to do, (3) who is accountable, and (4) what metrics prove the system is safe and improving.

1) Inventory (fill for each agent/workflow)
- Agent/workflow name:
- Owner (human DRI):
- Business function: (Engineering / Data / Support / Sales / Security)
- Primary tools it can call: (GitHub, CI, Terraform, Jira, Slack, DB read, etc.)
- Data it can access: (docs only / sanitized logs / customer PII / financials)
- Environments: (local, staging, prod)
- Tier: T1 Read-only / T2 Propose / T3 Assisted execution / T4 Autonomous

2) Agent RACI (per workflow)
Define responsibilities for these steps:
- R (Responsible): who does the work? (agent name)
- A (Accountable): who owns the outcome? (human DRI)
- C (Consulted): security, legal, SRE, product
- I (Informed): stakeholders, support, leadership

3) Guardrails (minimum requirements by tier)
T1 Read-only:
- No write access; no customer PII unless approved.
- Logging: prompts/tool calls logged with redaction.

T2 Propose:
- May draft PRs, runbooks, tickets; cannot merge or deploy.
- Required: PR template filled (risk, tests, rollback).

T3 Assisted execution:
- Can run actions ONLY behind a human approval click.
- Required: change window policy + audit log + kill switch.

T4 Autonomous (rare):
- Scope tightly (single service or internal tool).
- Required: canary + automatic rollback + on-call paging rules.
- Required approvals: Security + SRE sign-off in writing.

4) Explicit “Never Do” List (edit for your org)
- Modify IAM policies or auth flows.
- Run production data backfills touching PII.
- Change billing/entitlements.
- Disable security monitoring.
- Execute migrations outside change windows.

5) Logging & Audit Checklist
- Tool-call logs enabled (what, when, by which agent, with which credential).
- Redaction rules for secrets and PII.
- Retention policy (e.g., 30/90/180 days) aligned with compliance.
- Incident procedure: how to export logs for investigation.

6) Weekly Operating Metrics (track and review)
- Change failure rate (% of deploys causing incident/rollback):
- MTTR (minutes):
- Review minutes per shipped change:
- Lead time for changes (hours/days):
- Security exceptions/week (count):
- Agent utilization: % of PRs/tickets initiated by agent (optional):

7) Rollout Plan (30 days)
Week 1: Inventory + tier classification; remove over-privileged tokens.
Week 2: Standard templates (PR/runbook) + required CI/policy checks.
Week 3: Incident drill (prompt injection + runaway automation scenario).
Week 4: Publish dashboard; assign owners; set quarterly audit date.

Sign-off
- Engineering leader:
- Security leader:
- SRE/Platform leader:
- Product leader (if user-facing impact):

End state: Agents are fast, but the organization is safe by design—because humans are always accountable, permissions are scoped, and metrics prove reliability.