AGENT SURFACE SPEC (ONE-PAGE TEMPLATE)

1) Workflow target
- Name the exact workflow (not a persona): e.g., “Draft a support reply from an existing thread,” “Stage a PR that fixes a failing test,” “Propose calendar moves for conflicts.”
- Define the artifact the user will end up with: email draft, PR diff, updated record set, document patch.

2) Allowed actions (capability boundary)
- List the tools/APIs the agent may call.
- For each tool, list the allowed verbs (read/search/write/delete) and scope (workspace/project/folder).
- Define explicit forbidden actions (e.g., “No sending emails,” “No deleting records,” “No production deploys”).

3) Permission model
- What identity does the agent run as (user-delegated OAuth, service account, bot user)?
- What are the least-privilege scopes required?
- Approval gates:
 - Which actions require explicit approval every time?
 - Which actions can run automatically only after opt-in?
- Add a kill switch: workspace admin can disable agent actions and revoke tokens.

4) UX contract (what the user sees)
- Plan UI: show steps before execution.
- Preview UI: staged result is the default.
- Diff UI: show what will change (text diff, record diff, file list).
- One-click controls: Approve, Edit, Rerun, Undo, Stop.

5) Observability & audit trail
- Record an append-only event for each run:
 - user/workspace, intent, tools called, inputs, outputs, artifacts created, approvals.
- Expose a user-facing trace for debugging (“what it did” timeline).
- Export path: how admins can retrieve logs for compliance/support.

6) Failure handling
- Define what “stop” looks like: if missing permission, low confidence, ambiguous entity.
- Clarification UX: one crisp question or 2–4 selectable options; otherwise halt.
- Safe fallback: produce a draft/plan instead of executing.

7) Reversibility
- Define the undo story for every write:
 - rollback, restore point, revert commit, record history.
- Define the maximum blast radius of a single run (cap number of writes or require approval past a threshold).

8) Shipping checks (before GA)
- Runbook: how support diagnoses a bad run using traces.
- Abuse tests: prompt injection attempts, malicious inputs, cross-tenant access checks.
- Regression set: a small, versioned set of real scenarios you replay before releases.

Use this template as a gate: if a section is blank, you’re not shipping an agent surface—you’re shipping a demo.