AGENTIC WORKFLOW LAUNCH CHECKLIST (90 DAYS) Goal: Ship one production-grade agentic workflow with measurable ROI, bounded risk, and auditable controls. 1) Scope (Week 1) - Pick ONE workflow with: high repetition, clear KPI, low blast radius. - Write the KPI with baseline and target (examples: “reduce support handle time from 14 min to 10 min”, “cut security questionnaire turnaround from 5 days to 2 days”). - Define what the agent is NOT allowed to do (e.g., send messages, modify billing, deploy code). 2) Data & Access (Weeks 1–2) - Inventory data sources the workflow needs (KB, CRM, ticketing, docs, code). - Decide retrieval approach: only governed sources; no “write-everything memory” in v1. - Create service accounts for tools/APIs; enforce least privilege. - Define retention rules for logs (e.g., 30 days by default; longer only if needed). 3) Output Contract (Weeks 2–3) - Specify a strict output schema (fields, required sections, format). - Require citations (doc IDs/URLs + quoted spans) for factual claims. - Define acceptable failure: the agent can say “I don’t know” and escalate. - Create a rubric (0–2 or 0–5 scale) for: accuracy, completeness, tone, safety. 4) Controls (Weeks 3–4) - Tool allowlist + JSON schema validation for every tool call. - Hard limits: max steps, max tool calls, timeout, max tokens. - Cost budget per run in USD; block/abort when exceeded. - Policy gates: secret/PII detection; block external output if triggered. - Human approval required for any customer-facing output in v1. 5) Evaluation Harness (Weeks 4–6) - Build an eval set of 50–200 real cases (start with past tickets/requests). - Add “hard cases”: edge conditions, ambiguous prompts, known failure modes. - Establish pass thresholds (example: >=90% schema compliance, >=80% accuracy score). - Run evals in CI on every change: prompt, retrieval, tool schema, model. 6) Observability (Weeks 5–7) - Log: inputs, plan, tool calls + tool outputs, final output, and cost. - Create a weekly sampling process (e.g., review 20 random runs + all flagged runs). - Define an SLO for the workflow (example: “85% of drafts accepted with minor edits”). 7) Launch (Weeks 7–10) - Pilot with a small internal group or 5–10% of traffic. - Keep a rollback switch (feature flag) and clear ownership (one DRI). - Track KPI, acceptance rate, escalation rate, and cost per run daily. 8) Expand Autonomy (Weeks 10–13) - Only expand scope after 2–3 consecutive weeks meeting thresholds. - Add new tools one at a time; update eval set with new failure modes. - Consider limited autonomy for reversible actions (e.g., draft-only, propose-only). Definition of Done (for v1) - KPI improved vs baseline. - Audit logs exist for every run. - Budgets/timeouts enforced. - Evals run in CI with documented thresholds. - Human approval gate for external actions. Suggested weekly cadence - 30 min “agent retro”: top failures, top wins, eval updates, policy updates, cost review.