AGENTIC OPS ROLLOUT KIT (90-DAY PLAN + READINESS CHECKLIST) Purpose Use this kit to ship one production agent workflow in 90 days with measurable ROI and bounded risk. This is designed for Seed–Series C startups that need leverage without compliance or reliability surprises. STEP 0 — Choose the right workflow (Day 1–3) Score 5 candidate workflows from 1–5 on each: - Volume: how many times/week does it occur? - Novelty: how variable are cases? (lower novelty is better) - Risk: worst-case impact of a bad action? - Measurability: can you define a success metric in one sentence? - Data readiness: do you have clean sources of truth? Pick the best “high-volume, low-novelty, low/medium-risk” option. STEP 1 — Define success + guardrails (Day 4–7) Write: - Success metric (example: “Reduce median time-to-first-response from 4h to 2.5h in 60 days.”) - Error budget (example: “Customer-impact errors ≤ 1 per 1,000 runs.”) - Cost ceiling (example: “≤ $1.00 per successful resolution.”) - Hard constraints: max runtime, max tool calls, max spend/run. STEP 2 — Build your eval set (Week 2) - Collect 200–500 representative historical cases. - Label ground truth: correct action, correct escalation, and any prohibited actions. - Create a “red team” subset (20–50 cases) with ambiguity, policy traps, and adversarial phrasing. STEP 3 — Shadow mode (Weeks 3–4) - Agent produces recommendations only; no execution. - Log everything: inputs, retrieved sources (with doc versions), tool-call simulations, outputs. - Track: agreement rate with humans, top failure categories, and average cost/run. STEP 4 — Tooling + identity boundaries (Weeks 5–6) - Create a dedicated agent service account. - Implement least privilege: only the minimum read/write permissions. - Separate “research” access (browsing/search) from “execution” tools. - Add allowlists for external communications and destinations. STEP 5 — Ship with approvals (Weeks 7–8) - Put approval gates on irreversible actions (money movement, external email, entitlement changes, deploy/merge). - Build diff-based approvals: show what will change, sources used, and confidence flags. - Add a kill switch and rollback plan. STEP 6 — Operate it like a service (Weeks 9–12) Weekly review dashboard: - Completion rate - Escalation rate - Customer-impact errors per 1,000 runs - Median time-to-resolution - Cost per successful run - Human minutes saved (estimate + spot-check) Do a postmortem for every customer-impact incident; add one preventive control each time (policy rule, tool restriction, better retrieval source, or improved escalation). PRODUCTION READINESS CHECKLIST (go/no-go) - Quality: ≥ 90% success on 200+ eval cases - Safety: approval gates on all irreversible actions - Observability: 100% runs traced with tool-call logs - Security: scoped identity + least privilege verified - Cost: cost/run ≤ 20% of equivalent human cost - Escalation: defined human owner + SLA + runbook When this workflow is stable, reuse the scaffolding (auth, tracing, policy templates) to launch the next workflow in half the time.