AGENTIC FEATURE SPEC CHECKLIST (APPROVALS + AUDIT-FIRST) Use this to spec one agentic capability (e.g., “close Jira tickets,” “draft and send invoices,” “open PRs,” “triage support”). If a section can’t be answered, that’s a product risk. 1) USER VALUE + BOUNDARY - The single workflow this agent improves: - The exact systems it touches (e.g., Jira, GitHub, Google Workspace, Salesforce): - Hard boundary: what it will not do (explicitly list): - Primary user persona and their authority (role): 2) ACTION CONTRACT (STRUCTURED) - Define a typed action schema the model must output (JSON): - Required fields (IDs, targets, action types) - Optional fields (notes, confidence, rationale) - Define validation rules (what gets rejected): - Missing IDs, ambiguous targets, large fan-out, policy violations - Define idempotency strategy for writes (idempotency key per action): 3) APPROVAL UX - What the user sees before execution: - Diff / draft / plan (choose one) - Affected objects list (records, repos, recipients) - Approval modes: - Single approve, approve-per-step, or batch approve - Clear “cancel” and “edit” paths: - Default stance: propose-first unless risk is demonstrably low 4) PERMISSIONS + IDENTITY - Auth method (OAuth, SSO, API token) and minimal scopes: - Map agent actions to user identity (who is the actor of record?): - RBAC rules: who can approve which actions: - Handling of missing permission (actionable refusal copy + request flow): 5) SAFETY GUARDRAILS - Allowlist/denylist (domains, repos, project keys, recipients): - Rate limits and fan-out caps (qualitative is fine): - Escalation triggers: - Sensitive labels (security, customer-impacting) - Large blast radius - Low confidence / ambiguous match - Rollback plan (how to revert each write): 6) OBSERVABILITY + AUDIT - Audit log fields: - Requester, approver, timestamp, tool calls, inputs/outputs, result - User-visible activity timeline: - Admin export requirements (if any): - Error taxonomy and user-facing error messages: 7) FAILURE MODES (WRITE THEM DOWN) - Partial failure: what happens if step 2 fails after step 1 succeeded? - Retries: what is safe to retry automatically vs must re-approve? - Timeouts: how long before the job is marked “stuck,” and what then? 8) QA + LAUNCH - Test plan: - Golden-path fixtures (known objects) - Adversarial inputs (ambiguous names, conflicting instructions) - Permission-denied scenarios - Rollout plan: - Start with propose-only, then gated execution - Feature flag and kill switch - Support playbook: - How to diagnose a bad action from logs - How to respond to user disputes (“Why did it do this?”) 9) SUCCESS CRITERIA (NON-NUMERIC IS OK) - What observable behavior means it’s working: - Faster approvals, fewer back-and-forth edits, fewer manual drafts - What behavior means it’s unsafe: - Unreviewable actions, unclear ownership, missing receipts If you only implement two things: (1) a review/approval surface and (2) an audit log users can actually read. Everything else stacks on top.