Agent Production Readiness Checklist (2026)

Use this to move an agent from “works in a demo” to “safe to operate in production.” Focus on controls that are testable: identity, permissions, budgets, and auditability.

1) Identity (Non-human principal)
- Create a dedicated identity per agent in your IdP/IAM (Okta, Microsoft Entra ID, Google Cloud Identity, AWS IAM, etc.). No shared keys.
- Assign an explicit human owner (team + on-call rotation) and document purpose and scope.
- Prefer short-lived credentials (OIDC/workload identity) over long-lived API keys.
- Implement rotation and revocation. Prove you can kill access quickly.
- Ensure downstream systems show the agent identity in their own audit logs (GitHub, Jira, Slack, cloud provider).

2) Authorization (Tool and resource scoping)
- Default-deny tool access. Create an allowlist of tools/endpoints/resources.
- Separate read permissions from write permissions; keep write privileges minimal.
- Add hard rules in code for forbidden actions (delete data, modify IAM, transfer funds) unless explicitly approved.
- Use existing protections: GitHub branch protection, required reviews, CI checks, database RBAC.
- Maintain versioned tool schemas so you can correlate behavior to a specific tool contract.

3) Budgets and quotas (Cost + blast radius)
- Cap model requests/tokens per agent and per time window.
- Cap tool calls per run and per day.
- Cap write actions per run (PRs opened, messages sent, tickets transitioned).
- Add timeouts and circuit breakers for external calls.
- Define safe failure behavior: when quotas hit, the agent stops and escalates.

4) Approvals (Human-in-the-loop where it matters)
- Define “protected actions” that require explicit human approval.
- Approval UI must show a diff/preview of side effects (e.g., patch, email body, CRM field changes).
- Log approver identity, timestamp, and what they approved.
- Make approvals easy to revoke and rerun.

5) Audit trail and forensic replay
- Every run gets a run_id tied to the triggering input (ticket ID/webhook/user request).
- Log: model name/provider, system instructions version, tool schema version.
- Log every tool call with arguments, response, timestamps, and errors.
- Record external object links/IDs created or modified (PR URL, Slack permalink, Jira issue key).
- Prove replay: reconstruct the full chain from input → decisions → tool calls → side effects.

6) Security and privacy hygiene
- Treat tool I/O as sensitive: apply access controls, retention rules, and redaction for secrets/PII.
- Scan for secret leakage (agent outputs and tool payloads).
- Run adversarial tests: prompt injection in docs/tickets, ambiguous instructions, malformed tool responses.

Definition of done: you can answer, precisely and quickly, “What can this agent do?”, “What did it do?”, and “How do we stop it?”