Agent Startup Readiness Checklist (2026)

Use this checklist before you scale traffic, hire a sales team, or sign enterprise SLAs. Goal: prove your agent can deliver outcomes with bounded risk and profitable marginal cost.

1) Outcome definition (Day 1–2)
- Write 1 primary KPI (e.g., “% of tickets resolved without human,” “invoices processed/day,” “claims closed/week”).
- Define a catastrophic error (e.g., wrong refund amount, wrong patient record, unauthorized account change) and a target rate (aim for ~0%).
- Define the “unit of work” you will bill for (task, document, incident, resolution).

2) Cost model (Day 2–4)
- Instrument per-run cost: tokens + tool/API calls + retrieval + sandbox compute.
- Measure success-cost, not average cost: cost per successful completion and 95th percentile cost.
- Set a hard budget per run (e.g., $0.25 SMB, $2.00 enterprise) and enforce it with routing/fallback rules.
- Decide escalation policy: what triggers human review, target escalation rate (e.g., <5% in steady state).

3) Reliability + evaluation (Day 4–7)
- Build an eval set of at least 200 cases: 50 “happy path,” 100 messy real-world variants, 50 adversarial or policy-edge cases.
- Track: task success rate, tool-call validity, policy violations, latency (p50/p95), and catastrophic errors.
- Add shadow mode and staged rollout (5%→25%→50%→100%).
- Add replay: store run traces (objective, tool calls, parameters, outputs) with redaction.

4) Guardrails + governance (Day 7–10)
- Implement tool allowlists per role, plus schema validation on every tool call.
- Add PII handling: redaction rules, retention policy, and a “no-train” stance unless explicitly contracted.
- Add approvals for high-risk actions (refunds over threshold, account deletion, payments).
- Ensure idempotency for side effects (retries must not double-execute).

5) Go-to-market readiness (Day 10–12)
- Create an ROI report the buyer can export: outcomes delivered, exceptions, savings, and cost.
- Define a 30–60 day pilot with a single workflow owner and explicit success criteria.
- Prepare procurement answers: data residency, sub-processors (model providers), SOC 2 timeline, incident response.

6) Pricing + packaging (Day 12–14)
- Pick a pricing metric aligned to value (per resolution, per document, per incident, % recovered).
- Create autonomy tiers: Suggest → Assist → Delegate → Autopilot, each with higher price and stronger controls.
- Add overage/limits that protect your margin (max runs/day, max cost/run, throttle behavior).

Exit criteria to scale:
- Unit economics: AI variable cost consistently within your target band (e.g., 10–20% of revenue SMB, 5–15% enterprise).
- Reliability: success rate at or above target and stable across two consecutive releases.
- Governance: audit logs + approvals + safe rollback available for every side-effecting action.
- GTM: at least 3 referenceable customers with measured outcomes and a clear expansion path.