AGENTIC DELEGATION PLAYBOOK (90 DAYS)

Goal
Stand up an “agentic operating model” where AI agents can propose or execute work with clear ownership, least-privilege permissions, measurable quality, and auditable trails.

Definitions
- Agent: Any system that can plan + take multi-step actions (code, tickets, replies, runbooks).
- Agent Owner: Named human accountable for the agent’s outcomes, configuration, and access.
- Trust SLO: A measurable reliability/quality bar that must be met before expanding delegation.

PHASE 1 — BASELINE (Weeks 1–2)
1) Capture baseline metrics (weekly):
   - Engineering: lead time, deploy frequency, change failure rate, MTTR.
   - Support: containment rate, escalation rate, CSAT.
   - Security: time-to-triage, time-to-remediate.
2) Identify top 10 failure modes (e.g., auth regressions, policy mistakes, hallucinated citations).
3) Choose ONE pilot workflow with bounded risk (examples: flaky test repair, docs updates, Tier-1 macros).

PHASE 2 — GUARDRAILS (Weeks 3–5)
4) Create “Agent Role Cards” (one page each):
   - Purpose, allowed inputs, allowed tools, data sources, escalation triggers.
   - Write permissions (PR-only vs direct actions).
   - Budget limits (token/inference, timeouts, max actions per run).
5) Implement least privilege:
   - Read-only by default.
   - Production actions require time-bound access + approval.
6) Require audit artifacts:
   - Link to source context, tool calls, diffs, tests, and a short rationale.

PHASE 3 — EVALUATION (Weeks 6–8)
7) Build an evaluation set (50–200 cases):
   - Normal cases + edge cases + known failures.
   - Include security and compliance checks where relevant.
8) Run red-team prompts against your workflows (misleading inputs, ambiguous requests, policy violations).
9) Establish pass thresholds:
   - Critical scenarios must pass (e.g., no unsafe refunds, no privilege escalation, no secrets exposure).

PHASE 4 — DELEGATION (Weeks 9–11)
10) Roll out with canaries:
   - Limit to one service/repo/queue.
   - Limit to 1–5% of traffic or a small customer segment.
11) Add human gates:
   - PR review required.
   - Mandatory CI + security scans.
   - For support, escalation policy + dollar limits for refunds/credits.

PHASE 5 — SCALE (Weeks 12–13)
12) Set and publish Trust SLOs, e.g.:
   - Agent PR rollback rate ≤2% within 48 hours.
   - Agent Tier-1 CSAT ≥90% of human baseline.
13) Expand only when Trust SLO is met for 30 days.
14) Write the internal playbook:
   - Role Cards, review standards (“review invariants”), incident procedures, and ownership map.

Weekly Operating Rhythm (ongoing)
- 30-min review: rollback/reopen rates, new failure modes, top agent wins.
- Update Role Cards and permissions based on observed risk.
- Add 5–10 new eval cases per week from real failures.

Deliverables Checklist
- [ ] Baseline dashboard with segmented metrics (agent vs human)
- [ ] Agent Role Cards + named owners
- [ ] IAM/permission map for each agent
- [ ] Audit log retention policy
- [ ] Evaluation set + red-team results
- [ ] Trust SLOs + expansion policy