Agent Control Plane Readiness Checklist (2026) Use this checklist to move from “agents in demos” to “agents in production.” Score each item 0/1 and total your score. Anything below 20/30 typically signals you’re still in pilot territory. 1) Inventory & Ownership - Do you have a registry of every agent in production (name, owner, purpose, risk tier)? - Is there a clear on-call/incident owner for each agent workflow? - Are agent prompts, tool schemas, and routing rules versioned in Git with code review? 2) Cost & Budgets - Per-run budgets: max steps, max tokens, max wall-clock time. - Per-run max cost in USD (even if approximate) with hard enforcement. - Weekly dashboards for spend by agent/version/model/tool. - Loop detection: stop on repeated tool calls or repeated “no-progress” states. 3) Routing & Fallbacks - Documented routing policy (small model by default, escalate only when needed). - Clear escalation triggers (low confidence, tool errors, high dollar thresholds, sensitive topics). - “Retrieval-first” and caching strategy to reduce repeated LLM calls. 4) Identity & Access Control - No shared static API keys for tools that mutate data. - Short-lived, scoped credentials (TTL <= 15 minutes) minted per run. - Actions execute on-behalf-of a principal (user/service) with traceable attribution. - Separation of read tools vs write tools with different policies. 5) Policy & Approvals - Deny-by-default tool access; allowlist per agent. - Policy engine evaluates tool calls (who, what action, what resource, what context). - Human approval for irreversible/high-risk actions (refunds, deletes, permission grants, prod deploys). 6) Observability & Audit - End-to-end traces: model chosen, tokens in/out, tool calls, latencies, retries, final output. - Central log retention policy and PII redaction rules. - Ability to replay a run (inputs + tool responses + decisions) for debugging. 7) Evaluation & Release Management - Golden-set regression tests for critical tasks (stored and versioned). - Automated evaluation (LLM-as-judge or rule-based checks) with thresholds. - Canary rollout and rollback plan by agent version. 8) Data & Safety - Clear rules for what data can be sent to model providers (PII/PHI/PCI handling). - Prompt injection defenses: tool-call confirmation, content filtering, and context isolation. - Incident response playbook for agent failures (who to notify, how to disable, how to remediate). Scoring Guidance - 0–10: Prototype. Treat outputs as untrusted; don’t allow write actions. - 11–20: Pilot. Limited scope; human approvals required for most actions. - 21–27: Production-ready. Add SLOs, canaries, and tighter policy coverage. - 28–30: Mature. Optimize cost/latency, expand safely to more workflows.