AI AUDIT TRAIL READINESS CHECKLIST (STARTUP EDITION) Use this as a build spec and a procurement-prep doc. If you can’t answer an item quickly, treat it as product work. 1) IDENTITY + ACCESS - SSO supported (SAML or OIDC) and documented. - SCIM supported for provisioning/deprovisioning. - No shared “god key” for agent actions; use per-tenant, per-tool scoped credentials. - Role-based access control documented (who can view prompts/logs, who can change policies, who can approve actions). - Admin actions are logged (policy changes, connector changes, role changes). 2) DATA BOUNDARIES + RETENTION - Clear statement of what you store: prompts, retrieved context, outputs, tool-call arguments, embeddings. - Customer-configurable retention for AI logs (at least: short retention and extended retention modes). - Redaction controls (mask secrets, tokens, credentials; allow customers to add patterns). - Access controls for sensitive logs (default restrict; break-glass access is audited). - Documented DPA/security posture that matches your deployment model (SaaS vs VPC vs on-prem). 3) PROVENANCE (CAN YOU REPLAY?) - Every AI response has a request_id and is traceable to: actor, session, tenant. - Retrieval is attributable: source system + document/object identifiers + versions/ETags where available. - Model is attributable: provider + model name + configuration hash + system prompt version. - Tool actions are attributable: tool name + approval status + external IDs created/modified. 4) CHANGE CONTROL - Version prompts and safety settings as release artifacts. - Track model/provider changes (including silent upstream model updates where detectable). - Maintain deployment history per tenant. - Rollback plan exists and is tested for: prompt config, tool permissions, connector config. 5) EVALUATION EVIDENCE - Maintain a small, explicit eval suite aligned to your product’s real failure modes (not generic trivia). - Run evals before promoting changes to production behavior. - Store eval results with the build/deployment they correspond to. - Have a process for customer-reported bad outputs: triage, reproduction, fix, verification. 6) AGENTS + HIGH-IMPACT ACTIONS - Tool allowlist exists; default deny for tools that cause irreversible side effects. - Human approval is available for high-impact actions (email send, ticket close, code merge, payment initiation). - Approvals are logged: who approved, when, what they approved. - Rate limits and spending/usage caps exist per tenant. 7) EXPORT + INTEGRATION - Logs are structured events (not just text strings). - Export path exists (webhook, API, or standard telemetry like OpenTelemetry) and is documented. - Customers can correlate your request_id with their systems (headers, metadata fields). 8) INCIDENT READINESS - You can produce an incident timeline from logs within hours, not days. - You can answer: who accessed what data, which model ran, what tools were called, and what changed. - Status page and customer communication plan exist. If you do only one thing this week: define your canonical “AI event” schema (request → retrieval → model call → tool call → outcome) and make it exportable.