Transactional AI Build Checklist (v1) Goal: convert an LLM-powered feature from “helpful text” into a production workflow that can safely perform side effects (writes) with auditability and replay. 1) Define the transaction boundary - Write down the exact side effects the system is allowed to perform (create ticket, refund, change setting, send email). - For each side effect, define a single owner service (the executor). The LLM never writes directly. - Decide the highest-risk action; make it require explicit approval from a human or a policy gate. 2) Model output must be structured - Define a schema for every tool call payload (JSON Schema, Pydantic, Zod, protobuf—pick one). - Reject any payload that fails validation. Do not “fix it in the prompt.” - Store the validated payload as the “intent record” before executing. 3) Identity and permissions - Identify the acting principal for each action (end user, admin, service account). - Use scoped tokens (OAuth where applicable). Ensure scopes map to business permissions. - Never let the model expand scope. Permission checks live in code. 4) Idempotency and retries - Require an idempotency key for every mutation. - Ensure the executor de-duplicates server-side. - Define retry behavior per tool: which calls are safe to retry, which need compensation. 5) Durable logs and traceability - Generate correlation IDs per workflow. - Log: input (sanitized), model version, prompt/template version, retrieved sources, validated tool payloads, tool responses, final outcome. - Make logs queryable by customer/account for support and compliance. 6) Retrieval governance (if using RAG) - Enforce entitlements at retrieval time (per-user/per-group), not just per-tenant. - Track provenance: which documents/chunks were used. - Treat retrieved text as untrusted input; never let it directly dictate tool execution. 7) Human override and UX - For high-risk actions, provide a review screen that shows: proposed action, inputs, and expected side effect. - Provide an undo or compensation path where possible. - Make failures legible: show which validation gate failed and how to fix it. 8) Pre-launch tests - Create a small eval set of real workflow cases (including edge cases and ambiguous inputs). - Add adversarial tests: prompt injection in retrieved docs, malformed tool payloads, partial outages. - Run chaos-style tests on retries to confirm idempotency. Definition of done: the workflow can be replayed from the intent record, produces the same side effect exactly once, and can be explained end-to-end via logs.