COMPOUND AI SHIPPING CHECKLIST (2026) Use this as a pre-launch and ongoing operations checklist for production AI systems (RAG, agents, copilots). 1) WORKFLOW & RISK DEFINITION - Name the workflow (e.g., “support resolution”, “refund handling”, “PR generation”). - Define success as a business metric: containment rate, AHT reduction, PR acceptance rate, time-to-resolution. - Classify risk tier: LOW (drafting), MED (user-facing advice), HIGH (money movement, auth, legal commitments). - Define allowed actions per tier. For HIGH risk, require verifier + structured outputs. 2) ROUTING & BUDGETS (COST + LATENCY) - Implement an intent classifier/router (can be a small model) with explicit fallback rules. - Set budgets: max tool calls, max tokens, max wall-clock time, and max retries. - Track CPST (cost per successful task). Establish a target (e.g., <$0.01 for common actions). - Add caching for stable outputs (policy summaries, common troubleshooting steps) with TTL and invalidation rules. 3) PRIVATE DATA PLANE (GOVERNED CONTEXT) - Inventory data sources (Docs, CRM, tickets, code, BI). Assign freshness SLAs per source. - Implement permissioned retrieval: filter by identity and document-level/row-level ACLs BEFORE the model sees context. - Log every retrieval: request_id, document IDs, chunk IDs, timestamps, and user identity. - Set retention and deletion policies: how long prompts, outputs, and embeddings are stored. 4) TOOLING SAFETY (AGENTS) - Create a tool allowlist per intent. Deny-by-default. - Require structured outputs for actions (JSON schema) and validate before execution. - Add a verifier step for HIGH-risk actions (second model or deterministic checks). - Implement “safe failure states”: explicit “cannot complete” + escalation path. 5) EVALS & RELEASE ENGINEERING - Build a golden set (50–500 examples) covering normal + edge cases. - Add adversarial cases: prompt injection, data exfiltration attempts, permission bypass attempts. - Define pass/fail gates: groundedness, citation correctness, refusal correctness, tool safety. - Use shadow traffic (1–5%) before ramp; have rollback criteria (e.g., +2% escalation, +20% latency). 6) OBSERVABILITY & ON-CALL - Instrument traces across retrieval, model calls, tool calls, and final outputs. - Monitor: P50/P95 latency, tool error rate, misroute rate, refusal rate, and cost per outcome. - Establish an on-call playbook: how to disable a tool, flip a routing rule, or revert a prompt. 7) SECURITY & COMPLIANCE READINESS - Document data flow: where data transits, where it’s stored, and who can access logs. - Provide customer-facing controls: data residency (if applicable), retention settings, audit export. - Run regular reviews with Security and Legal for HIGH-risk workflows. If you can’t answer: “What did the system retrieve, which tools did it call, and why did it act?” you’re not ready for production.