AGENT OPS READINESS CHECKLIST (v1) Use this before you connect an LLM to tools that can change state (email send, ticket updates, CRM writes, payments, repo writes). 1) TOOL PERMISSIONS (CAPABILITIES) - List every tool/action the agent can call (read vs write). - Separate “draft” actions from “commit” actions (draft email vs send; draft reply vs close ticket). - Use least-privilege credentials (scoped OAuth where possible; separate prod vs sandbox). - Add a hard allowlist of tools and parameters. Don’t let the model invent tool names. 2) APPROVALS + HUMAN-IN-THE-LOOP - Define which actions require approval (money movement, outbound communication, data deletion, permission changes). - Make approval UX explicit: who approves, where it happens (Slack, email, web), and what context they see. - Add “stop and ask” behavior as a first-class state, not an error. 3) AUDIT TRAIL (NON-NEGOTIABLE) - For each run, log: trigger source, user/tenant, model version, prompt template version. - Log tool calls (name, params), tool responses, and final outcomes. - Store an immutable run record ID you can hand to customers during incidents. - Ensure logs avoid leaking secrets (redact tokens, credentials, sensitive fields). 4) EVALS AS RELEASE GATES - Create an eval set of real scenarios (start with ~25 cases your users care about). - Tag cases by risk: unsafe action, policy violation, wrong tool call, grounding failure, cost blowup. - Run a small suite on every PR and block merges on critical regressions. - Run a larger suite nightly to catch drift from model/provider changes. 5) COST + LATENCY CONTROLS - Set per-tenant budgets (daily/weekly) and per-workflow caps. - Add caching for repeated retrieval/context where appropriate. - Add guardrails on context size (document limits, truncation rules). - Implement graceful degradation: smaller model, fewer tools, or “draft only” mode under budget pressure. 6) INCIDENT RESPONSE - Define what constitutes an agent incident (unauthorized action, data exposure, spammy outputs, runaway spend). - Build a kill switch: disable tool use per tenant or globally. - Provide a rollback path: revert changes where possible or queue compensating actions. - Prepare a customer-facing post-incident export: run IDs, actions taken, timestamps. 7) SECURITY BASICS THAT AGENTS BREAK IF YOU IGNORE THEM - Treat untrusted text as hostile input (emails, web pages, tickets, documents). - Prevent data exfiltration paths: restrict what the agent can retrieve and what it can send out. - Keep secrets out of prompts and retrieval stores. - Document your subprocessors and data-handling terms for procurement. PASS/FAIL RULE If you can’t answer: “What did the agent do, who approved it, and what data did it use?” you’re not ready to ship tool-using automation.