Context Contract & CI Checklist (LLM Apps) Use this to turn “retrieval + prompt” into a testable, auditable context pipeline. 1) Define the Context Contract (write it down) - Allowed sources: list systems (published docs site, product database, ticket macros, policy PDFs). Name an owner per source. - Forbidden sources: list internal drafts, stale wikis, personal drives, unrestricted Slack channels, etc. - Precedence rules: what wins in conflicts (e.g., published policy > internal wiki > chat transcripts). - Freshness rules: define “effective date” requirements and expiration behavior for policy-like content. - Output requirements: citations must include source ID + revision/date; refusal is acceptable when context is missing. 2) Make a Context Bundle Artifact - Store a structured bundle (JSON) per response: query, retrieval strategy, index name/version, filters, snippet IDs, snippet revisions, tool calls. - Ensure replay: you can reconstruct the exact bundle later. 3) Build a Golden Set - Collect real questions from tickets, sales calls, internal Slack. - For each question, tag expected source(s) of truth and known “wrong but tempting” sources. 4) Add Context Assertions (before grading answers) - Inclusion: retrieved snippets must include at least one acceptable source for the question. - Exclusion: forbidden sources must never appear in the bundle. - Permissions: run the same query under least-privilege user identities; verify retrieval respects ACLs. - Conflict tests: add paired docs with contradictory statements; verify precedence rules. 5) CI Gates - On every change to indexing, retrieval config, prompts, or tool permissions: - Run golden set retrieval tests. - Fail the build on regressions in inclusion/exclusion/permissions. - Store diffs of context bundles (what changed and why). 6) Operationalize Ownership - Assign an on-call rotation for the context layer (not just the model endpoint). - Create a “source owner” escalation path for disputes and doc conflicts. - Add a change trigger: when a policy doc changes, it triggers re-indexing and a targeted eval run. 7) Incident Drill (monthly) - Pick a high-stakes query (refunds, retention, entitlements). - Require citations with revisions. - Re-run after 24 hours; explain any bundle changes. - Test access control with a restricted user. If you can do replay + permissions + conflict resolution, you’re past the demo stage. If you can’t, stop tuning chunk sizes and start enforcing the contract.