AGENTIC PRODUCT PRODUCTION READINESS KIT (2026)

Goal: Ship one agent workflow into real customer usage with measurable reliability, controlled risk, and sustainable gross margin.

1) Define the Workflow (Scope Narrowly)
- Name the workflow (e.g., “Support ticket triage + draft reply”).
- List allowed tools (max 3–5 to start).
- List disallowed tools explicitly.
- Define “write” vs “read” actions; start with read-only or draft-only where possible.

2) Risk Tiers (Policy + Approvals)
Create 3 tiers and map each tool/action:
- LOW: reversible, internal-only, draft outputs (auto-run).
- MEDIUM: external comms or data writes that can be rolled back (requires user confirmation).
- HIGH: money movement, access changes, irreversible writes (requires approval + secondary control).
For each tier define: who can run it (RBAC), what logging is required, and what the rollback plan is.

3) Success Criteria (Golden Set)
Build 30–60 “golden tasks”:
- 10 easy, 15 typical, 5 ambiguous, 5 adversarial/prompt-injection, 5 tool-failure scenarios.
For each task define pass/fail: correctness, citations, formatting (JSON schema), and whether the tool calls were appropriate.

4) Telemetry (You Must Be Able to Answer 4 Questions)
Instrument traces so you can answer:
- What did it do? (tool calls + outputs)
- Why did it do it? (prompt version, retrieved docs, routing decision)
- What did it cost? (tokens, tool fees, time)
- What happens if it’s wrong? (risk tier + guardrail triggered)
Add: P50/P95 latency, tokens per task, tool calls per task, escalation rate.

5) CPST Worksheet (Cost-Per-Successful-Task)
Track weekly:
- Avg token cost per task: $____
- Avg tool/API cost per task: $____
- Human review minutes per task: ____ minutes x $____/hour = $____
- Success rate (meets acceptance criteria): ____%
CPST = (token + tool + human) / success_rate
Set target CPST based on pricing. Example: If plan includes 1,000 tasks and costs $499/mo, CPST should be well under $0.30–$0.40 to preserve margin.

6) Launch Plan (30 Days)
Week 1: Golden set + acceptance criteria. 
Week 2: Tracing + dashboards + kill switch + token limits.
Week 3: Policy engine (allowlist, schemas, approvals) + incident runbook.
Week 4: Design partner rollout (5–10 users), weekly failure review, regression eval nightly.

Exit Criteria to Expand Beyond Design Partners
- Success rate: ____% on golden set AND stable for 2 weeks.
- Escalation rate: under ____%.
- CPST: under $____.
- Security: SSO/RBAC working; audit export tested; tool scopes least-privilege.
- Kill switch tested in staging + production.