Agentic RACI + Control Plane Checklist (2026)

Use this checklist to deploy agentic AI while preserving accountability, auditability, and operational safety. Designed for founders, engineering leaders, and GTM operators.

1) Define the workflow (one page)
- Workflow name + purpose (e.g., “Support Refund Agent”)
- Inputs (systems, data types, whether PII is involved)
- Outputs (emails, tickets, code changes, refunds, contract edits)
- Success metric (one primary: e.g., median time-to-resolution down 15%)
- Integrity metric (one primary: e.g., dispute rate stays <1.0%)

2) Assign Agentic RACI (per action type)
For EACH action the agent can take (e.g., “issue refund,” “merge PR,” “send outbound email”):
- E = Executor (agent/workflow name)
- A = Accountable Human (name + role; appears in performance review)
- S = System Owner (tool owner; responsible for permissions, reliability, logging)
- R = Risk Owner (security/privacy/legal; defines thresholds and exceptions)
- C/I = Consulted/Informed (who is notified and when)

3) Guardrails (must be explicit and testable)
- Permissioning: service account, least privilege, allowlist of tools/objects
- Approval policy: define thresholds (e.g., $ value, production access, PII)
- Budgets: daily token cap + monthly spend cap + auto-stop on loops
- Rate limits: concurrency cap + requests/min per integration
- Blast radius: max actions per run (e.g., max 10 emails, max 1 refund)

4) Logging + audit trail (minimum viable)
- Log every run with a unique request_id
- Store: prompt/template version, tool calls, diffs/side effects, outputs
- Retention: 90–365 days depending on regulation; redact PII where possible
- Access control: logs are restricted; provide audit export path for compliance

5) Evaluation suite (treat prompts like production code)
- Build an offline dataset (100–500 real historical cases)
- Define pass/fail metrics (accuracy, policy adherence, tone, safety)
- Run evals on every template/workflow change; failures block promotion
- Track drift: weekly sample review of live runs vs offline benchmark

6) Incident response (agent-specific)
- Kill switch owner + on-call rotation
- Credential revocation/rotation procedure (goal: <60 minutes)
- Runbook for rollbacks (refund reversal, email apology, revert PR)
- Blameless post-incident review template includes:
  - What the agent did (trace)
  - Which guardrail failed or was missing
  - Corrective actions (policy, permissions, evals, training data)

7) Metrics dashboard (exec-ready)
- Leverage: cost per outcome, cycle time, volume handled
- Integrity: rollback/escalation rates, defect escape, dispute/churn deltas
- Trust: % actions auto-executed, % requiring approval, override rate
- Review cadence: weekly for first 8 weeks, then monthly

30–90 Day Rollout Targets
- Day 30: 2 workflows live with logs, approval gates, and budgets
- Day 60: eval suite + kill switch drill completed; integrity metrics stable
- Day 90: expanded autonomy only where integrity improves or holds steady

If you can’t name the Accountable Human for a decision, the workflow is not ready for autonomy.