AGENTIC FEATURE SPEC CHECKLIST (APPROVALS + AUDIT-FIRST)

Use this to spec one agentic capability (e.g., “close Jira tickets,” “draft and send invoices,” “open PRs,” “triage support”). If a section can’t be answered, that’s a product risk.

1) USER VALUE + BOUNDARY
- The single workflow this agent improves:
- The exact systems it touches (e.g., Jira, GitHub, Google Workspace, Salesforce):
- Hard boundary: what it will not do (explicitly list):
- Primary user persona and their authority (role):

2) ACTION CONTRACT (STRUCTURED)
- Define a typed action schema the model must output (JSON):
 - Required fields (IDs, targets, action types)
 - Optional fields (notes, confidence, rationale)
- Define validation rules (what gets rejected):
 - Missing IDs, ambiguous targets, large fan-out, policy violations
- Define idempotency strategy for writes (idempotency key per action):

3) APPROVAL UX
- What the user sees before execution:
 - Diff / draft / plan (choose one)
 - Affected objects list (records, repos, recipients)
- Approval modes:
 - Single approve, approve-per-step, or batch approve
- Clear “cancel” and “edit” paths:
- Default stance: propose-first unless risk is demonstrably low

4) PERMISSIONS + IDENTITY
- Auth method (OAuth, SSO, API token) and minimal scopes:
- Map agent actions to user identity (who is the actor of record?):
- RBAC rules: who can approve which actions:
- Handling of missing permission (actionable refusal copy + request flow):

5) SAFETY GUARDRAILS
- Allowlist/denylist (domains, repos, project keys, recipients):
- Rate limits and fan-out caps (qualitative is fine):
- Escalation triggers:
 - Sensitive labels (security, customer-impacting)
 - Large blast radius
 - Low confidence / ambiguous match
- Rollback plan (how to revert each write):

6) OBSERVABILITY + AUDIT
- Audit log fields:
 - Requester, approver, timestamp, tool calls, inputs/outputs, result
- User-visible activity timeline:
- Admin export requirements (if any):
- Error taxonomy and user-facing error messages:

7) FAILURE MODES (WRITE THEM DOWN)
- Partial failure: what happens if step 2 fails after step 1 succeeded?
- Retries: what is safe to retry automatically vs must re-approve?
- Timeouts: how long before the job is marked “stuck,” and what then?

8) QA + LAUNCH
- Test plan:
 - Golden-path fixtures (known objects)
 - Adversarial inputs (ambiguous names, conflicting instructions)
 - Permission-denied scenarios
- Rollout plan:
 - Start with propose-only, then gated execution
 - Feature flag and kill switch
- Support playbook:
 - How to diagnose a bad action from logs
 - How to respond to user disputes (“Why did it do this?”)

9) SUCCESS CRITERIA (NON-NUMERIC IS OK)
- What observable behavior means it’s working:
 - Faster approvals, fewer back-and-forth edits, fewer manual drafts
- What behavior means it’s unsafe:
 - Unreviewable actions, unclear ownership, missing receipts

If you only implement two things: (1) a review/approval surface and (2) an audit log users can actually read. Everything else stacks on top.