AGENTIC OPERATIONS PILOT PACK (30 DAYS)

Goal: Deploy one production-grade agent workflow that saves measurable time/cost without creating unacceptable risk.

1) PICK THE WORKFLOW (DAY 1–2)
- Choose a workflow that is (a) frequent, (b) standardized, (c) has bounded downside.
- Write a one-sentence “Definition of Done” (DoD). Example: “Ticket is correctly categorized, routed, and summarized with sources.”
- Identify the system of record (e.g., Zendesk, Salesforce, Jira). Decide what fields the agent may read vs. write.

2) DEFINE METRICS (DAY 1–3)
Track these from day one:
- Task success rate (% runs meeting DoD)
- Human review rate (% runs needing edits/approval)
- Time saved (minutes per run; estimate with before/after sampling)
- Cost per run ($; include model + infra)
- Incident rate (bad actions per 100 runs) and rollback time
Set targets for the pilot. Example: success ≥85%, cost ≤$0.25/run, time saved ≥5 min/run.

3) BUILD YOUR EVAL SET (DAY 3–7)
- Collect 50–200 real examples (tickets/leads/PRs). Redact PII if needed.
- Label outcomes: correct category, correct routing, correct summary, etc.
- Add 10–20 “nasty” cases (ambiguous, angry customer, policy edge cases).
- Store eval data with versioning so you can compare changes over time.

4) GUARDRAILS & GOVERNANCE (DAY 5–12)
Minimum controls before any write access:
- Separate agent identity (service account), least-privilege scopes
- Tool allowlist (explicitly enumerate allowed actions)
- Structured outputs (typed schema; reject free-form for critical fields)
- Trace logs for every run (prompt version, model, tool calls, outcomes)
- Kill switch (disable workflow quickly)
- Human approval for money/permissions/external comms

5) SHIP “DRAFT-ONLY” MODE (DAY 10–18)
- Agent reads systems + drafts actions (e.g., suggested routing + summary).
- Human approves/edits; capture edits as training/eval feedback.
- Review failures weekly; update prompts/tools; re-run eval suite each change.

6) GRADUATE TO LIMITED AUTOPILOT (DAY 18–30)
- Permit low-risk writes only (tags, internal notes, status fields).
- Keep high-risk actions behind approval thresholds (e.g., refunds >$500).
- Add anomaly detection: spike in cost/run, spike in failure rate, unusual tool calls.

7) HANDOFF & OWNERSHIP (BY DAY 30)
- Assign a DRI (directly responsible individual) for outcomes.
- Publish a one-page runbook: what it does, what it cannot do, how to disable, where logs live.
- Create a monthly review: metrics trend, incidents, cost, and next workflow candidates.

OUTPUTS YOU SHOULD HAVE AFTER 30 DAYS
- One workflow running in production with traces
- An eval set and a simple dashboard of the five core metrics
- A permission model (read/write boundaries) and a kill switch
- A clear ROI estimate (time saved, quality impact, and operating cost)

If you can’t measure success, limit blast radius, and explain actions with logs, you’re not ready for autonomy—keep the agent in draft-only mode until you are.