ICMD AgentOps 90-Day Launch Pack Use this to ship one production-grade agent (internal or customer-facing) without creating a security or reliability debt bomb. 1) Scope & ROI (Days 0–15) - Pick ONE workflow with clear volume and outcome (e.g., “resolve password reset tickets,” “summarize CI failures,” “draft renewal quotes”). - Define success metrics: target % autonomy, accuracy threshold, max latency (p95), and max $ cost per task. - Write a “permission contract” for the agent: allowed tools, allowed data classes, forbidden actions, and escalation rules. - Establish baseline: current human handle time, error rate, and monthly volume. Convert to $ impact. 2) Architecture (Days 16–35) - Orchestrator: choose state machine/workflow style (graph/steps) and ensure every step is logged. - Tools: split read vs write tools; add schemas; enforce idempotency keys on side-effectful calls. - Retrieval: implement provenance (source, timestamp, ACL); prefer small authoritative snippets over full-doc dumps. - Budgets: hard caps on tool calls, wall time, and cost per run. 3) Security & Governance (Days 16–60) - Least privilege: per-tool scopes; rotate credentials; never embed overprivileged API keys in prompts. - Policy gate: evaluate each planned tool call (resource allowlist, user auth, data class, thresholds). - Prompt injection posture: treat retrieved text as untrusted; block “instructions” from sources; red-team with poisoned docs. - Data handling: log classification (PII/PCI/secrets/internal); redact outputs to external channels. 4) Evaluation (Days 36–60) - Build an eval suite from real cases (100–500): include edge cases and failure examples. - Define pass/fail criteria per case: correctness, citations/provenance, policy compliance, and budget compliance. - Run evals in CI: compare prompt/model/retrieval changes; require approvals for regressions. 5) Pilot & Human-in-the-Loop (Days 61–75) - Start read-only → propose-only → execute with approvals. - Implement reviewer UI: show plan, retrieved sources, tool calls, and a one-click “escalate” path. - Track: autonomy rate, reviewer override rate, top failure categories, and time saved. 6) Production Ops (Days 76–90) - SLOs: define p95 latency, error rate, and policy-violation rate. Set alert thresholds. - Runbooks: “tool down,” “rate limited,” “bad retrieval,” “policy block spike,” “cost spike.” - Incident process: categorize by layer (tool/state/context/policy/reasoning) and write postmortems. - Rollback: ensure actions can be reversed or corrected; store action logs for audits. Go/No-Go Gate - Go if: eval pass rate meets threshold on low-risk cases, no high-severity policy violations in pilot, and costs/latency are inside budgets. - No-Go if: you can’t reproduce failures, can’t explain action provenance, or can’t enforce least privilege at the tool layer. Operating cadence after launch - Weekly: review top 10 failures + cost drivers. - Monthly: refresh eval suite with new real cases; rotate credentials; re-run red-team prompts. - Quarterly: expand scope by one tool or one workflow stage; re-assess SLOs and autonomy targets.