AGENTIC DELEGATION PLAYBOOK (90 DAYS) Goal Stand up an “agentic operating model” where AI agents can propose or execute work with clear ownership, least-privilege permissions, measurable quality, and auditable trails. Definitions - Agent: Any system that can plan + take multi-step actions (code, tickets, replies, runbooks). - Agent Owner: Named human accountable for the agent’s outcomes, configuration, and access. - Trust SLO: A measurable reliability/quality bar that must be met before expanding delegation. PHASE 1 — BASELINE (Weeks 1–2) 1) Capture baseline metrics (weekly): - Engineering: lead time, deploy frequency, change failure rate, MTTR. - Support: containment rate, escalation rate, CSAT. - Security: time-to-triage, time-to-remediate. 2) Identify top 10 failure modes (e.g., auth regressions, policy mistakes, hallucinated citations). 3) Choose ONE pilot workflow with bounded risk (examples: flaky test repair, docs updates, Tier-1 macros). PHASE 2 — GUARDRAILS (Weeks 3–5) 4) Create “Agent Role Cards” (one page each): - Purpose, allowed inputs, allowed tools, data sources, escalation triggers. - Write permissions (PR-only vs direct actions). - Budget limits (token/inference, timeouts, max actions per run). 5) Implement least privilege: - Read-only by default. - Production actions require time-bound access + approval. 6) Require audit artifacts: - Link to source context, tool calls, diffs, tests, and a short rationale. PHASE 3 — EVALUATION (Weeks 6–8) 7) Build an evaluation set (50–200 cases): - Normal cases + edge cases + known failures. - Include security and compliance checks where relevant. 8) Run red-team prompts against your workflows (misleading inputs, ambiguous requests, policy violations). 9) Establish pass thresholds: - Critical scenarios must pass (e.g., no unsafe refunds, no privilege escalation, no secrets exposure). PHASE 4 — DELEGATION (Weeks 9–11) 10) Roll out with canaries: - Limit to one service/repo/queue. - Limit to 1–5% of traffic or a small customer segment. 11) Add human gates: - PR review required. - Mandatory CI + security scans. - For support, escalation policy + dollar limits for refunds/credits. PHASE 5 — SCALE (Weeks 12–13) 12) Set and publish Trust SLOs, e.g.: - Agent PR rollback rate ≤2% within 48 hours. - Agent Tier-1 CSAT ≥90% of human baseline. 13) Expand only when Trust SLO is met for 30 days. 14) Write the internal playbook: - Role Cards, review standards (“review invariants”), incident procedures, and ownership map. Weekly Operating Rhythm (ongoing) - 30-min review: rollback/reopen rates, new failure modes, top agent wins. - Update Role Cards and permissions based on observed risk. - Add 5–10 new eval cases per week from real failures. Deliverables Checklist - [ ] Baseline dashboard with segmented metrics (agent vs human) - [ ] Agent Role Cards + named owners - [ ] IAM/permission map for each agent - [ ] Audit log retention policy - [ ] Evaluation set + red-team results - [ ] Trust SLOs + expansion policy