AgentOps Readiness Checklist (2026) Use this checklist to ship one production agent workflow safely. Score each line as: Not Started / In Progress / Done. 1) Workflow Definition - Define the workflow boundary in one sentence (input, output, and system of record). - List “must-never-happen” failures (e.g., refund > $500, emailing wrong customer, deleting data). - Define a human fallback path and the maximum time before escalation (e.g., 60 seconds). 2) Tooling & Permissions - Inventory tools the agent can call; separate read-only tools from write tools. - Implement an allowlist: only approved tools are callable in production. - Enforce least privilege: scoped OAuth tokens, short-lived credentials, no raw API keys in prompts. - Add schema validation (JSON Schema/OpenAPI) for every tool request and response. - Add hard budgets: max tool calls per run, max runtime, and idempotency keys for write actions. 3) Observability & Audit - Capture end-to-end traces per run: model calls, tool calls, retrieval steps, and policy decisions. - Log prompt/version identifiers (prompt hash, model version, routing policy version). - Store tool inputs/outputs with redaction; confirm PII is not retained in plaintext logs. - Implement replay for incidents: reconstruct a run with the same inputs and versions. - Set data retention (e.g., 7–30 days for traces; longer only for audit requirements). 4) Evaluation Gates - Build an offline eval set from real historical cases (start with 200; grow to 1,000+). - Define pass/fail metrics: success rate, policy compliance rate, and escalation rate. - Add regression checks to CI/CD: block deploy if core metrics degrade beyond thresholds. - Run canary testing (e.g., 5% traffic) and compare against baseline before ramp. 5) Reliability Engineering - Fail closed by default: if parsing fails, tool errors occur, or confidence is low—escalate. - Add timeouts and retries with backoff; ensure retries are safe (idempotent writes). - Implement rate limits per tenant and per workflow to prevent runaway loops. - Track p95 latency and set an explicit target (e.g., <8 seconds for a customer-facing action). 6) Cost & Unit Economics - Measure cost per successful completion (CPSC) per workflow. - Implement model routing: cheap model for triage/extraction; premium model for complex reasoning. - Add caching for retrieval results and deterministic sub-steps. - Set budget alerts (daily/weekly) and define an automatic “degrade mode” (e.g., switch to cheaper model). 7) Security & Compliance - Run a threat model focused on tool abuse, data exfiltration, and prompt injection. - Require approvals for high-risk actions (e.g., payouts, permission changes, production deploys). - Perform quarterly access reviews for tools and service accounts used by agents. - Prepare an audit packet: architecture diagram, logging/redaction approach, and incident response steps. If you can check “Done” on at least 80% of these items for one workflow, you’re ready to scale to the next workflow with the same platform patterns.