Agent Control Plane Readiness Checklist (2026)

Use this checklist to move from “agents in demos” to “agents in production.” Score each item 0/1 and total your score. Anything below 20/30 typically signals you’re still in pilot territory.

1) Inventory & Ownership
- Do you have a registry of every agent in production (name, owner, purpose, risk tier)?
- Is there a clear on-call/incident owner for each agent workflow?
- Are agent prompts, tool schemas, and routing rules versioned in Git with code review?

2) Cost & Budgets
- Per-run budgets: max steps, max tokens, max wall-clock time.
- Per-run max cost in USD (even if approximate) with hard enforcement.
- Weekly dashboards for spend by agent/version/model/tool.
- Loop detection: stop on repeated tool calls or repeated “no-progress” states.

3) Routing & Fallbacks
- Documented routing policy (small model by default, escalate only when needed).
- Clear escalation triggers (low confidence, tool errors, high dollar thresholds, sensitive topics).
- “Retrieval-first” and caching strategy to reduce repeated LLM calls.

4) Identity & Access Control
- No shared static API keys for tools that mutate data.
- Short-lived, scoped credentials (TTL <= 15 minutes) minted per run.
- Actions execute on-behalf-of a principal (user/service) with traceable attribution.
- Separation of read tools vs write tools with different policies.

5) Policy & Approvals
- Deny-by-default tool access; allowlist per agent.
- Policy engine evaluates tool calls (who, what action, what resource, what context).
- Human approval for irreversible/high-risk actions (refunds, deletes, permission grants, prod deploys).

6) Observability & Audit
- End-to-end traces: model chosen, tokens in/out, tool calls, latencies, retries, final output.
- Central log retention policy and PII redaction rules.
- Ability to replay a run (inputs + tool responses + decisions) for debugging.

7) Evaluation & Release Management
- Golden-set regression tests for critical tasks (stored and versioned).
- Automated evaluation (LLM-as-judge or rule-based checks) with thresholds.
- Canary rollout and rollback plan by agent version.

8) Data & Safety
- Clear rules for what data can be sent to model providers (PII/PHI/PCI handling).
- Prompt injection defenses: tool-call confirmation, content filtering, and context isolation.
- Incident response playbook for agent failures (who to notify, how to disable, how to remediate).

Scoring Guidance
- 0–10: Prototype. Treat outputs as untrusted; don’t allow write actions.
- 11–20: Pilot. Limited scope; human approvals required for most actions.
- 21–27: Production-ready. Add SLOs, canaries, and tighter policy coverage.
- 28–30: Mature. Optimize cost/latency, expand safely to more workflows.