AGENTIC OPS READINESS CHECKLIST (2026)

Use this checklist to move from an agent demo to a production agent that is safe, observable, and cost-bounded.

1) WORKFLOW SELECTION
- Define one narrow workflow with a measurable outcome (e.g., “close 30% of order-status tickets without human reply”).
- Write the success metric (quality), safety metric (policy violations), and efficiency metric (cost + latency).
- Identify the highest-risk action in the workflow (money movement, deletes, sensitive data) and mark it “approval required” for v1.

2) TOOLING CONTRACTS (TREAT TOOLS LIKE APIS)
- Inventory every system the agent will touch (Stripe, Salesforce, Jira, Zendesk, internal DBs).
- For each tool: define schema, versioning, rate limits, idempotency strategy, and error semantics.
- Implement read-only tools first; add write tools only after logging and rollback are in place.

3) IDENTITY, PERMISSIONS, AND POLICY
- Create a dedicated non-human identity (NHI) per agent/workflow (not shared).
- Enforce least privilege per tool and per environment (dev/staging/prod).
- Add a policy gate for high-risk actions (e.g., refunds > $100, any deletion, access to sensitive fields).
- Require short-lived credentials and explicit allowlists for tool endpoints.

4) COST AND LATENCY BUDGETS
- Set token budgets per task (avg and p95) and tool-call budgets (avg and p95).
- Add routing: small model for classify/extract; large model only for complex reasoning.
- Add caching where repeated queries occur (common policies, product docs, known issues).
- Define circuit breakers: when external APIs throttle or p95 latency spikes, switch to draft-only mode.

5) EVALUATION BEFORE AUTONOMY
- Build a scenario suite with 200–1,000 real examples (anonymized) including edge cases and adversarial inputs.
- Add unit tests for prompt/tool schema changes; block deploys on regressions.
- Define required pass rates for low-risk actions (e.g., 95%+) before enabling autonomy.

6) OBSERVABILITY AND AUDIT TRAILS
- Log: prompt version, model, tool schema versions, retrieved doc IDs/hashes, tool inputs/outputs, and final action.
- Enable trace visualization of the tool-call graph (one trace per task).
- Build dashboards: success rate, escalation rate, policy violations per 1k tasks, cost per task, p95 latency.

7) RELEASE AND INCIDENT PRACTICES
- Ship in stages: observe-only → draft-only → execute-with-approval → limited autonomy.
- Add a kill switch and a rollback plan for prompt/config/tool changes.
- Assign ownership: who is on-call, who triages failures, who approves policy changes.
- Run weekly reviews of top failure modes and update tools/policies before scaling.

EXIT CRITERIA FOR “PRODUCTION READY”
- Clear SLOs with alerting.
- Least-privilege identity + audited policy gates.
- Scenario suite running on every change.
- Hard budgets for tokens and tool calls.
- Documented escalation path and kill switch tested in staging.