Agent Runtime Readiness Checklist (v1)

Use this as a build plan for turning an “agent feature” into a production system. If you can’t check an item, assume you’re still in demo territory.

1) Tooling (Actions)
- List your agent’s tools (APIs it can call). For each tool, write a strict schema: required fields, allowed values, and expected error codes.
- Add server-side validation. The agent must not be able to pass arbitrary strings into sensitive parameters.
- Document tool failure behavior: retryable vs non-retryable, and what the agent should do next.

2) Permissioning (Blast Radius)
- No shared credentials. Use per-tenant or per-user auth where possible.
- Define scopes per tool (read vs write). Keep write scopes rare.
- Create an emergency revoke path that immediately stops tool usage for a tenant/user.

3) Human Gates (Irreversible Actions)
- Identify irreversible or high-risk actions (send email to customers, issue refunds, deploy, delete, change IAM).
- Implement approval gates in code, not prompts. The UI should show a clear “proposed action” diff.
- Log who approved, when, and what was executed.

4) Observability (Reproducibility)
- Store a trace per run: inputs, retrieved context identifiers, tool calls (with safe redaction), outputs, and errors.
- Add correlation IDs that connect UI events → backend traces → external system actions.
- Build a “replay” mode for debugging that re-runs the same steps with the same versions.

5) Evals in CI (Change Control)
- Create an eval suite that asserts properties, not vibes:
 * policy refusal cases
 * tool-call schema correctness
 * tenant boundary tests
 * prompt injection attempts
 * recovery behavior on tool errors
- Gate deployments on eval results. Version prompts, tool schemas, retrieval settings, and model identifiers.

6) Rollout (Containment)
- Ship behind feature flags per tenant.
- Start with read-only tools, then gated write tools.
- Add a kill switch that disables autonomous execution but keeps “draft/propose” mode.

Definition of “ready”: you can answer, for any customer incident, exactly what the agent saw, what it did, why it did it (via trace), and how you prevented it from doing worse.