Agent Runtime Readiness Checklist (v1) Use this as a build plan for turning an “agent feature” into a production system. If you can’t check an item, assume you’re still in demo territory. 1) Tooling (Actions) - List your agent’s tools (APIs it can call). For each tool, write a strict schema: required fields, allowed values, and expected error codes. - Add server-side validation. The agent must not be able to pass arbitrary strings into sensitive parameters. - Document tool failure behavior: retryable vs non-retryable, and what the agent should do next. 2) Permissioning (Blast Radius) - No shared credentials. Use per-tenant or per-user auth where possible. - Define scopes per tool (read vs write). Keep write scopes rare. - Create an emergency revoke path that immediately stops tool usage for a tenant/user. 3) Human Gates (Irreversible Actions) - Identify irreversible or high-risk actions (send email to customers, issue refunds, deploy, delete, change IAM). - Implement approval gates in code, not prompts. The UI should show a clear “proposed action” diff. - Log who approved, when, and what was executed. 4) Observability (Reproducibility) - Store a trace per run: inputs, retrieved context identifiers, tool calls (with safe redaction), outputs, and errors. - Add correlation IDs that connect UI events → backend traces → external system actions. - Build a “replay” mode for debugging that re-runs the same steps with the same versions. 5) Evals in CI (Change Control) - Create an eval suite that asserts properties, not vibes: * policy refusal cases * tool-call schema correctness * tenant boundary tests * prompt injection attempts * recovery behavior on tool errors - Gate deployments on eval results. Version prompts, tool schemas, retrieval settings, and model identifiers. 6) Rollout (Containment) - Ship behind feature flags per tenant. - Start with read-only tools, then gated write tools. - Add a kill switch that disables autonomous execution but keeps “draft/propose” mode. Definition of “ready”: you can answer, for any customer incident, exactly what the agent saw, what it did, why it did it (via trace), and how you prevented it from doing worse.