Context Contract & CI Checklist (LLM Apps)

Use this to turn “retrieval + prompt” into a testable, auditable context pipeline.

1) Define the Context Contract (write it down)
- Allowed sources: list systems (published docs site, product database, ticket macros, policy PDFs). Name an owner per source.
- Forbidden sources: list internal drafts, stale wikis, personal drives, unrestricted Slack channels, etc.
- Precedence rules: what wins in conflicts (e.g., published policy > internal wiki > chat transcripts).
- Freshness rules: define “effective date” requirements and expiration behavior for policy-like content.
- Output requirements: citations must include source ID + revision/date; refusal is acceptable when context is missing.

2) Make a Context Bundle Artifact
- Store a structured bundle (JSON) per response: query, retrieval strategy, index name/version, filters, snippet IDs, snippet revisions, tool calls.
- Ensure replay: you can reconstruct the exact bundle later.

3) Build a Golden Set
- Collect real questions from tickets, sales calls, internal Slack.
- For each question, tag expected source(s) of truth and known “wrong but tempting” sources.

4) Add Context Assertions (before grading answers)
- Inclusion: retrieved snippets must include at least one acceptable source for the question.
- Exclusion: forbidden sources must never appear in the bundle.
- Permissions: run the same query under least-privilege user identities; verify retrieval respects ACLs.
- Conflict tests: add paired docs with contradictory statements; verify precedence rules.

5) CI Gates
- On every change to indexing, retrieval config, prompts, or tool permissions:
 - Run golden set retrieval tests.
 - Fail the build on regressions in inclusion/exclusion/permissions.
 - Store diffs of context bundles (what changed and why).

6) Operationalize Ownership
- Assign an on-call rotation for the context layer (not just the model endpoint).
- Create a “source owner” escalation path for disputes and doc conflicts.
- Add a change trigger: when a policy doc changes, it triggers re-indexing and a targeted eval run.

7) Incident Drill (monthly)
- Pick a high-stakes query (refunds, retention, entitlements).
- Require citations with revisions.
- Re-run after 24 hours; explain any bundle changes.
- Test access control with a restricted user.

If you can do replay + permissions + conflict resolution, you’re past the demo stage. If you can’t, stop tuning chunk sizes and start enforcing the contract.