PRODUCTION AGENT READINESS CHECKLIST (2026)

Use this checklist to decide whether an agent is ready to move from demo/pilot into real customer workflows.

1) DEFINE THE JOB + SUCCESS METRICS
- Target workflow defined as a “job to be done” (e.g., resolve tier-1 tickets; onboard vendors; reset MFA).
- Baseline captured: average handle time, backlog, SLA breach rate, escalation rate, error rate.
- Success metrics set in customer terms: cost-per-resolved-case, time-to-resolution, % deflection, CSAT delta.
- Clear scope boundary: what the agent will NOT do (at launch) and what requires approval.

2) ARCHITECTURE + CONTROLLED AUTONOMY
- Tool permissions are least-privilege by default (allowlisted tools + actions).
- High-risk actions are gated (payments, deletes, access grants, compliance filings).
- Timeouts, retries, and fallbacks exist for every tool call.
- Human escalation path is explicit and fast (ticket handoff, Slack/Teams ping, on-call rotation).

3) OBSERVABILITY + AUDIT TRAILS
- Every run produces a trace: prompts, retrieved docs, tool inputs/outputs, decision points.
- Replay is possible for debugging (at least for 7 days; longer if regulated).
- Logs redact PII/secrets; secrets are never stored in plaintext.
- You can answer: “Why did the agent do this?” within minutes.

4) EVALUATIONS (EVEN IF SMALL)
- A “golden set” exists: 50–200 representative tasks with expected outputs or scoring criteria.
- Offline eval runs before any major change (prompt, model, retrieval, tool schema).
- Online canaries: start with 1–5% traffic; monitor success rate + escalation rate.
- Regression thresholds defined (e.g., block rollout if task success drops below 95%).

5) COST + MARGIN MANAGEMENT
- Track cost-per-successful-run (not just cost per 1k tokens).
- Identify top cost drivers: retries, long-context retrieval, verification passes.
- Implement routing where possible (cheap model for easy steps; strong model for hard steps).
- Pricing aligned to outcomes (per task / per resolution / share of savings), with usage caps.

6) SECURITY + COMPLIANCE BASICS
- SOC 2 plan (or equivalent) documented; customer security questionnaire prepared.
- Data boundaries documented: what is sent to model providers, what is retained, retention period.
- Tenant isolation validated (no cross-customer leakage in retrieval or logs).
- Incident response playbook exists (who responds, how to disable risky tools, customer comms).

GO/NO-GO RULE (SIMPLE)
GO if: you can prove measurable ROI in 30–60 days, hit a defined success rate on a golden set, and provide traces + permission controls that a security team can understand.
NO-GO if: success depends on manual prompt babysitting, you can’t reproduce runs, or a single tool failure breaks the entire workflow.