Agent Ops Readiness Checklist (2026) Use this checklist to move from “agent prototype” to a production-grade agent program. It’s designed for founders, engineering leaders, and operators. 1) Workflow Selection (Scope) - Identify one workflow with a clear start/end and measurable outcome (e.g., ticket triage, refund eligibility, quote generation). - Write a definition of DONE (what artifacts are produced, where they are stored, and what counts as success). - Define stop conditions: max time, max tool calls, max retries, and when to hand off to a human. - List the systems of record involved (Salesforce, Jira, Zendesk, ServiceNow, Netsuite, etc.). 2) Tooling & Permissions (Safety to Act) - Create an explicit allowlist of tools/actions the agent may call. - Enforce least privilege per tool: separate read vs write tokens; restrict fields and objects where possible. - Add an approval gate for high-impact actions (closing tickets, sending external emails, changing CRM fields, initiating refunds). - Implement schema validation for tool inputs/outputs (reject malformed calls; require required fields). - Ensure every action is auditable (who initiated, which agent version, which tools, timestamps, outcome). 3) Data & Retrieval (Grounding) - Document all knowledge sources and ownership (Confluence spaces, Drive folders, internal wikis). - Implement permission-aware retrieval (users only see what they’re allowed to see). - Add citation requirements for user-visible claims; log retrieved chunks/doc IDs. - Establish a refresh cadence for indexed content (daily/weekly) and monitor stale data incidents. 4) Evaluation & Regression (Quality) - Build a gold set: 50–200 real tasks with expected outcomes and edge cases. - Track workflow success rate, policy violations per 1,000 runs, escalation rate, and $/resolved task. - Add adversarial tests: prompt injection attempts, conflicting instructions, tool misuse, and data exfil scenarios. - Run evals in CI for every prompt/tool/schema change and for any model/provider upgrade. 5) Observability & Cost Controls (Operations) - Instrument tracing across each agent step: model calls, tool calls, retrieval, retries, and timeouts. - Set budgets: max tokens per task, max $ per task, and alerts on spend anomalies. - Monitor P95 latency end-to-end; distinguish interactive vs async workflows. - Implement circuit breakers: automatic degrade to read-only, or human handoff on repeated failures. 6) Rollout Plan (Change Management) - Start in shadow mode for 2–6 weeks: agent produces outputs but humans decide. - Provide a correction UI so humans can edit outputs and label failures (feeds eval datasets). - Ship staged autonomy: draft → suggest actions → execute low-risk → execute high-risk with approvals. - Publish “agent release notes” and define an on-call/ownership model for incidents. Exit Criteria (Minimum to Launch) - Documented permissions and approval flows. - Audit logs for all tool writes. - Gold set evals running in CI with baseline metrics. - Budget limits + alerts for cost and latency. - Clear handoff path and accountability when the agent is wrong.