AGENTIC OPS ROLLOUT KIT (90-DAY PLAN + READINESS CHECKLIST)

Purpose
Use this kit to ship one production agent workflow in 90 days with measurable ROI and bounded risk. This is designed for Seed–Series C startups that need leverage without compliance or reliability surprises.

STEP 0 — Choose the right workflow (Day 1–3)
Score 5 candidate workflows from 1–5 on each:
- Volume: how many times/week does it occur?
- Novelty: how variable are cases? (lower novelty is better)
- Risk: worst-case impact of a bad action?
- Measurability: can you define a success metric in one sentence?
- Data readiness: do you have clean sources of truth?
Pick the best “high-volume, low-novelty, low/medium-risk” option.

STEP 1 — Define success + guardrails (Day 4–7)
Write:
- Success metric (example: “Reduce median time-to-first-response from 4h to 2.5h in 60 days.”)
- Error budget (example: “Customer-impact errors ≤ 1 per 1,000 runs.”)
- Cost ceiling (example: “≤ $1.00 per successful resolution.”)
- Hard constraints: max runtime, max tool calls, max spend/run.

STEP 2 — Build your eval set (Week 2)
- Collect 200–500 representative historical cases.
- Label ground truth: correct action, correct escalation, and any prohibited actions.
- Create a “red team” subset (20–50 cases) with ambiguity, policy traps, and adversarial phrasing.

STEP 3 — Shadow mode (Weeks 3–4)
- Agent produces recommendations only; no execution.
- Log everything: inputs, retrieved sources (with doc versions), tool-call simulations, outputs.
- Track: agreement rate with humans, top failure categories, and average cost/run.

STEP 4 — Tooling + identity boundaries (Weeks 5–6)
- Create a dedicated agent service account.
- Implement least privilege: only the minimum read/write permissions.
- Separate “research” access (browsing/search) from “execution” tools.
- Add allowlists for external communications and destinations.

STEP 5 — Ship with approvals (Weeks 7–8)
- Put approval gates on irreversible actions (money movement, external email, entitlement changes, deploy/merge).
- Build diff-based approvals: show what will change, sources used, and confidence flags.
- Add a kill switch and rollback plan.

STEP 6 — Operate it like a service (Weeks 9–12)
Weekly review dashboard:
- Completion rate
- Escalation rate
- Customer-impact errors per 1,000 runs
- Median time-to-resolution
- Cost per successful run
- Human minutes saved (estimate + spot-check)
Do a postmortem for every customer-impact incident; add one preventive control each time (policy rule, tool restriction, better retrieval source, or improved escalation).

PRODUCTION READINESS CHECKLIST (go/no-go)
- Quality: ≥ 90% success on 200+ eval cases
- Safety: approval gates on all irreversible actions
- Observability: 100% runs traced with tool-call logs
- Security: scoped identity + least privilege verified
- Cost: cost/run ≤ 20% of equivalent human cost
- Escalation: defined human owner + SLA + runbook

When this workflow is stable, reuse the scaffolding (auth, tracing, policy templates) to launch the next workflow in half the time.