MVAI (Minimum Viable Agent Interface) Checklist Use this checklist to turn an “agent demo” into a product operators can trust. The goal: bounded autonomy, clear accountability, and fast recovery when things go wrong. 1) Task Definition (pick one workflow) - Name the workflow as a verb + artifact (e.g., “Create Jira tickets,” “Draft refund response,” “Reconcile invoices”). - Define “done” in observable terms (artifact exists, status set, notification sent). - Define what the agent is NOT allowed to do (explicit exclusions reduce risk). 2) Identity + Permissions - Every run is tied to a real user and workspace/tenant. - Tools use least-privilege scopes (separate read vs write; separate prod vs sandbox). - Keys/tokens are stored in a real secret manager and can be rotated. - Provide an access review screen: who connected what, which scopes, last used. 3) Run Ledger (receipts) - Log: inputs, model/tool decisions, tool calls (args + results), timestamps. - Make logs easy to export for audit and incident review. - Include authorship labels: suggested-by-agent vs executed-by-user vs auto-executed. 4) Approval Gates (policy) - Add a “pending actions” queue for anything high-impact. - Define default policies (examples): * always require approval for external messages * always require approval for refunds/credits * block permission changes unless explicitly allowed - Make approvals attributable (who approved, when). 5) Reversibility - Prefer idempotent operations and idempotency keys where possible. - Implement rollback (undo) for common actions, or compensating actions if undo isn’t possible. - Provide a “dry run” mode that shows intended side effects without executing. 6) Escalation + Stop Conditions - Define stop conditions (missing data, conflicting sources, tool errors, low confidence). - When stopping, the agent asks structured questions (not a vague “I’m stuck”). - Provide a clean handoff payload a human can take over with. 7) Testing + Change Control - Maintain a small suite of golden tasks with expected tool calls. - Version prompts/policies/tool schemas; keep release notes for behavior changes. - Add a kill switch to disable auto-execution per tenant. Shipping rule: if a user can’t answer “what happened?” and “how do I undo it?” from one screen, you’re not ready to scale autonomy.