AGENTIC WORKFLOW SPEC TEMPLATE (PRD + SAFETY + TELEMETRY)

1) Workflow name
- Short, operational name (verb + object): “Issue refund”, “Provision access”, “Turn support thread into Jira bug”.

2) User + context
- Primary user role(s):
- Systems involved (systems of record):
- Where the workflow starts (UI surface): in-app, Slack, Teams, email intake, admin console.

3) Problem statement (1 paragraph)
- Describe the current manual steps and the pain (tab switching, missing context, approvals).

4) Definition of DONE (verifiable)
- Terminal success state(s): (example: “Refund transaction exists AND account note created AND customer email drafted”) 
- Terminal failure states: auth failure, missing required field, policy violation, external outage.
- Read-after-write verification: what exact record(s) you will fetch to confirm completion.

5) Plan + steps (structured)
- Step list the agent will execute (max 5–10 steps).
- For each step: tool name, required inputs, outputs, and what can go wrong.

6) Tool contracts (constrained tools)
For each tool:
- Tool purpose (single responsibility):
- Allowed actions (be specific):
- Input schema (fields + types):
- Output schema:
- Idempotency key strategy (how to avoid double-writes):
- Rate limits / retries:

7) Permissions + identity
- Acting identity: per-user OAuth token vs service account (and why).
- Least-privilege scopes required:
- Role checks inside your app (who can run it):

8) Human-in-the-loop design
- What requires explicit approval:
- What can run automatically:
- Diff view: what you show before committing changes (fields, docs, tickets).
- Undo/rollback path:

9) Guardrails (policy)
- Disallowed actions (hard blocks):
- Data boundaries (what data the agent must not access):
- Safe fallbacks: create draft, open ticket, escalate to human.

10) Observability + telemetry (events)
Must log:
- task_started, task_completed (with terminal state)
- tool_called (tool name, params hash), tool_succeeded, tool_failed (error class)
- approval_requested, approval_granted/denied
- user_intervened (edited plan, corrected field, took over)
- undo_executed

11) QA plan
- Golden-path test cases:
- Failure-mode tests (expired token, missing field, duplicate records, rate limit):
- Security review checklist: scopes validated, logs scrubbed, secrets handling.

12) Rollout
- Gating: internal-only → beta cohort → default on/off
- Kill switch: how to disable tool writes instantly
- Support playbook: what support needs to diagnose failures (log pointers, correlation IDs)