Agent Ops Readiness Checklist (2026)

Use this checklist to move from “agent prototype” to a production-grade agent program. It’s designed for founders, engineering leaders, and operators.

1) Workflow Selection (Scope)
- Identify one workflow with a clear start/end and measurable outcome (e.g., ticket triage, refund eligibility, quote generation).
- Write a definition of DONE (what artifacts are produced, where they are stored, and what counts as success).
- Define stop conditions: max time, max tool calls, max retries, and when to hand off to a human.
- List the systems of record involved (Salesforce, Jira, Zendesk, ServiceNow, Netsuite, etc.).

2) Tooling & Permissions (Safety to Act)
- Create an explicit allowlist of tools/actions the agent may call.
- Enforce least privilege per tool: separate read vs write tokens; restrict fields and objects where possible.
- Add an approval gate for high-impact actions (closing tickets, sending external emails, changing CRM fields, initiating refunds).
- Implement schema validation for tool inputs/outputs (reject malformed calls; require required fields).
- Ensure every action is auditable (who initiated, which agent version, which tools, timestamps, outcome).

3) Data & Retrieval (Grounding)
- Document all knowledge sources and ownership (Confluence spaces, Drive folders, internal wikis).
- Implement permission-aware retrieval (users only see what they’re allowed to see).
- Add citation requirements for user-visible claims; log retrieved chunks/doc IDs.
- Establish a refresh cadence for indexed content (daily/weekly) and monitor stale data incidents.

4) Evaluation & Regression (Quality)
- Build a gold set: 50–200 real tasks with expected outcomes and edge cases.
- Track workflow success rate, policy violations per 1,000 runs, escalation rate, and $/resolved task.
- Add adversarial tests: prompt injection attempts, conflicting instructions, tool misuse, and data exfil scenarios.
- Run evals in CI for every prompt/tool/schema change and for any model/provider upgrade.

5) Observability & Cost Controls (Operations)
- Instrument tracing across each agent step: model calls, tool calls, retrieval, retries, and timeouts.
- Set budgets: max tokens per task, max $ per task, and alerts on spend anomalies.
- Monitor P95 latency end-to-end; distinguish interactive vs async workflows.
- Implement circuit breakers: automatic degrade to read-only, or human handoff on repeated failures.

6) Rollout Plan (Change Management)
- Start in shadow mode for 2–6 weeks: agent produces outputs but humans decide.
- Provide a correction UI so humans can edit outputs and label failures (feeds eval datasets).
- Ship staged autonomy: draft → suggest actions → execute low-risk → execute high-risk with approvals.
- Publish “agent release notes” and define an on-call/ownership model for incidents.

Exit Criteria (Minimum to Launch)
- Documented permissions and approval flows.
- Audit logs for all tool writes.
- Gold set evals running in CI with baseline metrics.
- Budget limits + alerts for cost and latency.
- Clear handoff path and accountability when the agent is wrong.