PRODUCTION AGENT READINESS CHECKLIST (2026)

Use this checklist before giving an agent write access to real systems.

1) SCOPE & ROI
- Workflow defined in one sentence (example: “Triage and route inbound support tickets for Product A”).
- Success metric chosen (e.g., +12% first-contact resolution OR -20% handle time within 60 days).
- Clear “stop conditions” documented: when the agent must escalate to a human.

2) ACTIONS, PERMISSIONS, AND THRESHOLDS
- Tool list is explicit (e.g., Zendesk, Salesforce, Stripe, Jira) and each tool has least-privilege scopes.
- High-impact actions have caps (e.g., refunds <= $25 auto; $26–$200 requires approval; >$200 blocked).
- Time-bound credentials in place (short-lived tokens; no shared admin keys).
- Rate limits and kill switch implemented (per-tenant and global).

3) TOOL CONTRACTS (NON-NEGOTIABLE)
- Every tool call has a schema (types, required fields, enums for reason codes).
- Validation runs before execution; invalid payloads return deterministic errors.
- Idempotency keys used for side-effecting actions (refunds, emails, record updates).

4) MEMORY DESIGN
- Tiered memory defined: short-term (per run), episodic (per case), semantic (facts with citations).
- Provenance required on stored facts (source system + record ID + timestamp).
- Data retention and deletion policy documented (PII handling, customer deletion requests).

5) EVALUATION & RELEASE GATING
- Golden set created (200–1,000 historical cases) with expected outcomes.
- Metrics tracked: trajectory success rate, tool-call error rate, policy violation rate, human takeover rate, cost per successful outcome.
- Regression gating in CI: releases blocked if key metrics drop beyond thresholds.

6) OBSERVABILITY & INCIDENT RESPONSE
- Full tracing per run: inputs, retrieval results, tool calls, outputs.
- Audit logs exportable for security/compliance.
- On-call runbook exists: common failures, rollback steps, kill switch location.

7) ROLLOUT PLAN
- Phase 1: shadow mode (recommendations only).
- Phase 2: assist mode (human approves actions).
- Phase 3: autopilot for low-risk actions only.
- A/B or staged rollout plan across tenants/queues with explicit exit criteria.

8) COST CONTROLS
- Reasoning budgets set per workflow (max $/case) and enforced in code.
- Model routing implemented (cheap model for classification; frontier model for complex reconciliation).
- Caching strategy defined (TTL, what can be cached, what cannot).

If you can’t check every item above, do not grant write permissions. Start in shadow/assist mode, tighten contracts, and build evals until you can prove safety and ROI.