AGENT SURFACE SPEC (ONE-PAGE TEMPLATE) 1) Workflow target - Name the exact workflow (not a persona): e.g., “Draft a support reply from an existing thread,” “Stage a PR that fixes a failing test,” “Propose calendar moves for conflicts.” - Define the artifact the user will end up with: email draft, PR diff, updated record set, document patch. 2) Allowed actions (capability boundary) - List the tools/APIs the agent may call. - For each tool, list the allowed verbs (read/search/write/delete) and scope (workspace/project/folder). - Define explicit forbidden actions (e.g., “No sending emails,” “No deleting records,” “No production deploys”). 3) Permission model - What identity does the agent run as (user-delegated OAuth, service account, bot user)? - What are the least-privilege scopes required? - Approval gates: - Which actions require explicit approval every time? - Which actions can run automatically only after opt-in? - Add a kill switch: workspace admin can disable agent actions and revoke tokens. 4) UX contract (what the user sees) - Plan UI: show steps before execution. - Preview UI: staged result is the default. - Diff UI: show what will change (text diff, record diff, file list). - One-click controls: Approve, Edit, Rerun, Undo, Stop. 5) Observability & audit trail - Record an append-only event for each run: - user/workspace, intent, tools called, inputs, outputs, artifacts created, approvals. - Expose a user-facing trace for debugging (“what it did” timeline). - Export path: how admins can retrieve logs for compliance/support. 6) Failure handling - Define what “stop” looks like: if missing permission, low confidence, ambiguous entity. - Clarification UX: one crisp question or 2–4 selectable options; otherwise halt. - Safe fallback: produce a draft/plan instead of executing. 7) Reversibility - Define the undo story for every write: - rollback, restore point, revert commit, record history. - Define the maximum blast radius of a single run (cap number of writes or require approval past a threshold). 8) Shipping checks (before GA) - Runbook: how support diagnoses a bad run using traces. - Abuse tests: prompt injection attempts, malicious inputs, cross-tenant access checks. - Regression set: a small, versioned set of real scenarios you replay before releases. Use this template as a gate: if a section is blank, you’re not shipping an agent surface—you’re shipping a demo.