Audited AI Feature Spec (Template) 1) Feature boundary (one paragraph) - What user problem does this solve? - What is explicitly out of scope? - What does “success” look like in the UI (not model metrics)? 2) Allowed actions (must be enumerable) List every action the AI can take. Write them as verbs: - Read actions (safe): search_docs, get_account_status, list_recent_deploys… - Write actions (risky): create_ticket, refund_invoice, change_role… If you can’t enumerate actions, stop: you’re building a chat surface, not a product. 3) Tool contracts (one per action) For each tool: - Name + description - Parameters schema (reject additionalProperties) - Auth model: which end-user permission is required? - Idempotency: what key prevents duplicates? - Side effects: what external system changes? - Dry-run support: yes/no - Failure handling: what error codes are returned? 4) Retrieval plan (thin by default) - Sources of truth (systems): e.g., Confluence, Google Drive, GitHub, Jira - Retrieval unit: doc IDs + versions (preferred) vs chunks - Filters: tenant, ACL, doc type, recency, lifecycle state - Keyword search requirement: exact IDs, error codes, names - Embeddings usage: only where semantic recall is required - Citation rules: every factual claim must cite doc ID + section/page 5) Safety gates - Which actions require human approval? - Which actions are read-only forever? - Rate limits and anomaly detection (per user/tenant) - Prompt-injection handling: treat retrieved text as untrusted input 6) Observability (what you must log) - User/tenant identifiers - Retrieval query + filters applied - Retrieved doc IDs/versions and snippet boundaries - Tool calls: name, args, auth scope, results, side-effect IDs - Model outputs + schema validation errors - UI-level outcomes: user accepted/rejected, edited, escalated 7) Evaluation (practical, repeatable) - Build a small set of real tasks (10–30) from support/ops tickets. - For each task: required sources, expected tool calls, unacceptable actions. - Track regressions: missing citations, wrong-tool selection, permission denials. 8) Launch checklist - Read-only mode in production first - Human approval for first write actions - Clear UI for citations and “I don’t know” states - Rollback plan: feature flag + tool disable switch Use this template on one workflow before expanding. If it can’t be audited, it can’t be trusted.