Deterministic LLM System Checklist (Action Apps) Use this when your LLM can take actions: send messages, modify data, trigger workflows, touch infrastructure, or spend money. The goal is simple: the model is allowed to suggest; the system decides. 1) Define “allowed actions” as tools, not prose - List every action as a named tool (e.g., send_email, create_ticket, update_crm_record). - For each tool: document required inputs, forbidden inputs, and side effects. - Put tools behind an allowlist; deny by default. 2) Add typed contracts - Define a JSON Schema (or equivalent) per tool call. - Reject unknown fields (no “extraProperties”). - Version contracts (tool_v1, tool_v2) to avoid breaking behavior during iterations. 3) Enforce invariants outside the model Write 5–15 hard rules that must never be violated. Examples: - Never email external domains without explicit approval. - Never execute arbitrary SQL; only parameterized queries. - Never delete records; soft-delete only. - Never access documents outside the user’s permissions. Implement these checks in code. Prompts can restate them, but prompts don’t enforce. 4) Make state explicit and server-side - Store workflow state in a database or durable workflow engine. - Never let the model “remember” whether an action happened. - Use idempotency keys for tool calls so retries don’t duplicate side effects. 5) Add an audit trail you can hand to security For each request, record: - Model name/version (as reported by the provider) - System prompt + user input (with redaction rules) - Retrieved context (document IDs, chunk IDs, URLs) when using RAG - Tool calls (name, arguments) and tool results - Final user-visible output Set retention and redaction policies (especially for PII). 6) Build an eval gate before you scale - Create a small “golden set” of real tasks and edge cases. - Run automated evals on every prompt/model/retrieval change. - Track regressions by category (format errors, wrong tool choice, policy violation). 7) Add human gates for high-risk transitions - Identify actions that require approval (refunds, outbound email to customers, production changes). - Insert an approval step with a clear diff: what will change, who will be affected. - Log approver identity. 8) Plan for provider/model drift - Treat model upgrades like dependency upgrades. - Pin versions where possible; test before switching. - Keep a rollback plan (previous model + previous prompt + previous retrieval settings). If you only do two things this week: (1) enforce schemas on tool calls, and (2) write invariants in code. That alone eliminates a large class of “agent went rogue” failures.