AI Data Boundary & Receipt Trail Checklist (Enterprise-Ready)

Use this checklist to harden any LLM feature (chat, copilot, agent) so you can explain what the system saw, what it used, and what it returned.

1) Draw the boundary (non-negotiable)
- Write down: what data is allowed to enter prompts? What data is forbidden?
- Separate “inference data stores” (docs for retrieval) from “training datasets” (if any) with different IAM roles and different buckets/projects.
- Decide whether raw prompts/outputs are stored by default. If yes, define retention and access gates.

2) Retrieval with permission correctness
- Enforce access checks before retrieval (per-document or per-row), not after generation.
- Ensure retrieval queries are always scoped by tenant_id and user permissions.
- Add a denial test: a user with no access must never retrieve a protected doc, even if they name it.

3) Citations and traceability
- Log retrieved doc IDs/chunk IDs for each response.
- In the UI, show citations for any factual or policy answer.
- Keep a minimal “receipt log” (request_id, model, retrieved chunks, tools called, output hash, safety decision).

4) Tool calling and agent safety
- Classify tools: read-only vs write/action tools.
- For write/action tools, require explicit user confirmation or a policy gate.
- Log tool name + args hash + result status.

5) Data hygiene upstream
- Redact obvious PII (emails, phone numbers, IDs) before indexing or prompting where feasible.
- Avoid indexing raw chat transcripts unless you have a clear user benefit and retention plan.
- For any tuning, only use content you own or have explicit rights to use; version the dataset.

6) Evaluation that replaces “train on more data” impulse
- Build a versioned test set of prompts you have rights to use.
- Score groundedness (answer must be supported by retrieved sources) and permission correctness.
- Gate releases behind feature flags; compare retrieval settings and prompts before considering tuning.

7) Incident readiness
- Define how to answer: “What did the assistant use to generate this output?”
- Define how to handle deletion requests for indexed documents.
- Define escalation: who can access raw logs, how access is approved, and how it’s audited.

If you can’t produce receipts without storing sensitive text everywhere, redesign the boundary first. That’s the product.