Agent Production Readiness Checklist (2026)

Use this checklist to move from a working demo to a production agent that is reliable, cost-contained, and auditable.

1) Define the workflow boundary
- Name the single workflow (e.g., “triage support tickets,” “enrich inbound leads”).
- List allowed systems (Zendesk, Jira, Salesforce, internal DB).
- Write a definition of “success” (what must be true at the end of a run).

2) Write an authority spec (permissions)
- Separate read-only tools from write tools.
- Identify “high-risk” actions (refunds, cancellations, deletions, PII exposure).
- Specify approval rules (human review required, dual control, or auto-approved).

3) Build a tool gateway (don’t call SaaS APIs directly)
- Implement allowlists per tool + method.
- Enforce tenant isolation and least privilege.
- Use short-lived scoped tokens; never place secrets in prompts.

4) Enforce budgets in code
- Step budget: max tool calls per run (start with 6–10).
- Spend budget: max $/run, max tokens/run.
- Rate limits: per tool, per tenant; include backoff and retries.

5) Implement trace logging and retention
- Log: user request, system prompt version, tool calls, tool responses, policy decisions, final output.
- Store traces immutably with a retention policy (commonly 30–180 days depending on compliance).

6) Create an evaluation set from real data
- Collect 50–200 historical cases for the workflow.
- Label outcomes: success, partial success, failure; note risk tier.
- Add adversarial and edge cases (timeouts, missing fields, ambiguous requests).

7) Choose metrics that map to operations
- Task success rate (segmented by risk tier).
- Cost per successful task (include retries and tool calls).
- Latency (median and P95).
- Escalation rate and “reason for escalation” taxonomy.

8) Ship in phases
- Phase A: shadow mode (agent runs, humans act) for 1–2 weeks.
- Phase B: human-in-the-loop (agent drafts actions; humans approve).
- Phase C: limited autonomy (only low-risk actions) with tight monitoring.

9) Add safety rails for write actions
- Require structured outputs and schema validation.
- Use confirmation prompts for irreversible actions.
- Add “undo” capability where possible (revert field changes, cancel queued actions).

10) Operationalize ownership
- Assign an on-call owner for incidents.
- Create a rollback plan: model revert, prompt revert, tool disable switch.
- Review traces weekly; update eval set monthly.

If you can’t answer “what can this agent do, how much can it spend, and who approved its actions?” you’re not production-ready.