AGENTIC PRODUCT PRODUCTION READINESS KIT (2026) Goal: Ship one agent workflow into real customer usage with measurable reliability, controlled risk, and sustainable gross margin. 1) Define the Workflow (Scope Narrowly) - Name the workflow (e.g., “Support ticket triage + draft reply”). - List allowed tools (max 3–5 to start). - List disallowed tools explicitly. - Define “write” vs “read” actions; start with read-only or draft-only where possible. 2) Risk Tiers (Policy + Approvals) Create 3 tiers and map each tool/action: - LOW: reversible, internal-only, draft outputs (auto-run). - MEDIUM: external comms or data writes that can be rolled back (requires user confirmation). - HIGH: money movement, access changes, irreversible writes (requires approval + secondary control). For each tier define: who can run it (RBAC), what logging is required, and what the rollback plan is. 3) Success Criteria (Golden Set) Build 30–60 “golden tasks”: - 10 easy, 15 typical, 5 ambiguous, 5 adversarial/prompt-injection, 5 tool-failure scenarios. For each task define pass/fail: correctness, citations, formatting (JSON schema), and whether the tool calls were appropriate. 4) Telemetry (You Must Be Able to Answer 4 Questions) Instrument traces so you can answer: - What did it do? (tool calls + outputs) - Why did it do it? (prompt version, retrieved docs, routing decision) - What did it cost? (tokens, tool fees, time) - What happens if it’s wrong? (risk tier + guardrail triggered) Add: P50/P95 latency, tokens per task, tool calls per task, escalation rate. 5) CPST Worksheet (Cost-Per-Successful-Task) Track weekly: - Avg token cost per task: $____ - Avg tool/API cost per task: $____ - Human review minutes per task: ____ minutes x $____/hour = $____ - Success rate (meets acceptance criteria): ____% CPST = (token + tool + human) / success_rate Set target CPST based on pricing. Example: If plan includes 1,000 tasks and costs $499/mo, CPST should be well under $0.30–$0.40 to preserve margin. 6) Launch Plan (30 Days) Week 1: Golden set + acceptance criteria. Week 2: Tracing + dashboards + kill switch + token limits. Week 3: Policy engine (allowlist, schemas, approvals) + incident runbook. Week 4: Design partner rollout (5–10 users), weekly failure review, regression eval nightly. Exit Criteria to Expand Beyond Design Partners - Success rate: ____% on golden set AND stable for 2 weeks. - Escalation rate: under ____%. - CPST: under $____. - Security: SSO/RBAC working; audit export tested; tool scopes least-privilege. - Kill switch tested in staging + production.