Agent Launch Readiness Checklist (2026)

Use this checklist before exposing an AI agent to customers or granting it access to privileged tools.

1) Scope and Success Definition
- Define one primary job-to-be-done (JTBD). Avoid “general assistant” scope.
- Specify what “success” means in measurable terms (e.g., “ticket resolved without human,” “quote generated and approved”).
- Define hard failure outcomes and escalation paths (human handoff, create a ticket, block the action).

2) Tooling and Permissions (Least Privilege)
- Maintain an explicit allowlist of tools the agent can call.
- Use scoped, short-lived credentials (task-scoped tokens when possible).
- Add approval gates for high-risk actions (refunds, payouts, prod changes, user deletions).
- Ensure idempotency on write tools (retries should not duplicate actions).

3) Cost Controls
- Set a max budget per task (e.g., $0.10–$0.50 depending on margin).
- Cap max tool calls per run (e.g., 6) and max runtime (e.g., 45 seconds).
- Implement model routing (cheap default, expensive fallback) and caching for deterministic lookups.
- Track “cost per successful task” and “cost per escalated task.”

4) Observability and Tracing
- Log every model request/response, tool call, tool arguments, latency, and cost with a correlation ID.
- Store traces in a searchable system (and define retention, e.g., 30 days unless regulated).
- Redact sensitive data (PII/PCI/PHI) from logs and prompts where feasible.

5) Evaluation (Behavior-Based)
- Create an initial replay set of at least 50 real cases; grow to 200+ before broad rollout.
- Add behavioral assertions (required steps, prohibited actions, correct tool usage) not just “final answer matches.”
- Track pass rate, major failure categories, and regressions after prompt/model/tool updates.

6) Rollout Plan
- Use feature flags; start with internal users, then a 1–5% canary.
- Define alert thresholds: spikes in tool calls, p95 latency, budget overruns, policy violations.
- Provide a safe fallback (human handoff, “draft only” mode, read-only mode).

7) Governance and Compliance
- Document data flows: what data enters prompts, where it’s stored, and who can access traces.
- Define retention and deletion policies aligned with customer requirements.
- Maintain an audit log for all privileged actions (who/what/when/why).

Exit Criteria (Recommended)
- 90%+ pass rate on your replay suite for the targeted workflow.
- 0 critical policy violations during a full canary week.
- p95 cost and latency within targets, with stable variance.
- Clear on-call ownership and a playbook for incidents (rollback, disable tools, rotate credentials).