MVAI (Minimum Viable Agent Interface) Checklist

Use this checklist to turn an “agent demo” into a product operators can trust. The goal: bounded autonomy, clear accountability, and fast recovery when things go wrong.

1) Task Definition (pick one workflow)
- Name the workflow as a verb + artifact (e.g., “Create Jira tickets,” “Draft refund response,” “Reconcile invoices”).
- Define “done” in observable terms (artifact exists, status set, notification sent).
- Define what the agent is NOT allowed to do (explicit exclusions reduce risk).

2) Identity + Permissions
- Every run is tied to a real user and workspace/tenant.
- Tools use least-privilege scopes (separate read vs write; separate prod vs sandbox).
- Keys/tokens are stored in a real secret manager and can be rotated.
- Provide an access review screen: who connected what, which scopes, last used.

3) Run Ledger (receipts)
- Log: inputs, model/tool decisions, tool calls (args + results), timestamps.
- Make logs easy to export for audit and incident review.
- Include authorship labels: suggested-by-agent vs executed-by-user vs auto-executed.

4) Approval Gates (policy)
- Add a “pending actions” queue for anything high-impact.
- Define default policies (examples):
 * always require approval for external messages
 * always require approval for refunds/credits
 * block permission changes unless explicitly allowed
- Make approvals attributable (who approved, when).

5) Reversibility
- Prefer idempotent operations and idempotency keys where possible.
- Implement rollback (undo) for common actions, or compensating actions if undo isn’t possible.
- Provide a “dry run” mode that shows intended side effects without executing.

6) Escalation + Stop Conditions
- Define stop conditions (missing data, conflicting sources, tool errors, low confidence).
- When stopping, the agent asks structured questions (not a vague “I’m stuck”).
- Provide a clean handoff payload a human can take over with.

7) Testing + Change Control
- Maintain a small suite of golden tasks with expected tool calls.
- Version prompts/policies/tool schemas; keep release notes for behavior changes.
- Add a kill switch to disable auto-execution per tenant.

Shipping rule: if a user can’t answer “what happened?” and “how do I undo it?” from one screen, you’re not ready to scale autonomy.