AI AUDIT TRAIL READINESS CHECKLIST (STARTUP EDITION)

Use this as a build spec and a procurement-prep doc. If you can’t answer an item quickly, treat it as product work.

1) IDENTITY + ACCESS
- SSO supported (SAML or OIDC) and documented.
- SCIM supported for provisioning/deprovisioning.
- No shared “god key” for agent actions; use per-tenant, per-tool scoped credentials.
- Role-based access control documented (who can view prompts/logs, who can change policies, who can approve actions).
- Admin actions are logged (policy changes, connector changes, role changes).

2) DATA BOUNDARIES + RETENTION
- Clear statement of what you store: prompts, retrieved context, outputs, tool-call arguments, embeddings.
- Customer-configurable retention for AI logs (at least: short retention and extended retention modes).
- Redaction controls (mask secrets, tokens, credentials; allow customers to add patterns).
- Access controls for sensitive logs (default restrict; break-glass access is audited).
- Documented DPA/security posture that matches your deployment model (SaaS vs VPC vs on-prem).

3) PROVENANCE (CAN YOU REPLAY?)
- Every AI response has a request_id and is traceable to: actor, session, tenant.
- Retrieval is attributable: source system + document/object identifiers + versions/ETags where available.
- Model is attributable: provider + model name + configuration hash + system prompt version.
- Tool actions are attributable: tool name + approval status + external IDs created/modified.

4) CHANGE CONTROL
- Version prompts and safety settings as release artifacts.
- Track model/provider changes (including silent upstream model updates where detectable).
- Maintain deployment history per tenant.
- Rollback plan exists and is tested for: prompt config, tool permissions, connector config.

5) EVALUATION EVIDENCE
- Maintain a small, explicit eval suite aligned to your product’s real failure modes (not generic trivia).
- Run evals before promoting changes to production behavior.
- Store eval results with the build/deployment they correspond to.
- Have a process for customer-reported bad outputs: triage, reproduction, fix, verification.

6) AGENTS + HIGH-IMPACT ACTIONS
- Tool allowlist exists; default deny for tools that cause irreversible side effects.
- Human approval is available for high-impact actions (email send, ticket close, code merge, payment initiation).
- Approvals are logged: who approved, when, what they approved.
- Rate limits and spending/usage caps exist per tenant.

7) EXPORT + INTEGRATION
- Logs are structured events (not just text strings).
- Export path exists (webhook, API, or standard telemetry like OpenTelemetry) and is documented.
- Customers can correlate your request_id with their systems (headers, metadata fields).

8) INCIDENT READINESS
- You can produce an incident timeline from logs within hours, not days.
- You can answer: who accessed what data, which model ran, what tools were called, and what changed.
- Status page and customer communication plan exist.

If you do only one thing this week: define your canonical “AI event” schema (request → retrieval → model call → tool call → outcome) and make it exportable.