AGENTIC UX LAUNCH CHECKLIST (2026)

Goal: Ship one agentic workflow that improves an outcome metric (activation, retention, support cost) without creating unacceptable risk or margin bleed.

1) Pick the right first workflow
- High frequency: users do it weekly/daily (routing, enrichment, setup, reporting).
- Bounded blast radius: errors are reversible or limited in scope.
- Clear “done”: define completion criteria in 1–2 sentences.
- Data availability: required inputs exist in-product or via a stable integration.

2) Define the task contract (non-negotiable)
- Inputs: required fields + acceptable defaults.
- Outputs: what objects change (records, settings, messages).
- Forbidden actions: export, delete, admin assignment, billing changes (unless explicitly allowed).
- Risk tiers: low/medium/high for each tool call.
- Idempotency: how you prevent duplicate actions on retries.

3) Build tool contracts before prompts
- Strict JSON schemas for tool inputs/outputs.
- Timeouts and retries with safe backoff.
- Validation: reject malformed or ambiguous tool parameters.
- Sandboxes/test modes for external systems (payments, email, CRM).

4) Design the delegation ladder (UX + permissions)
- Draft mode: agent proposes; user executes.
- Guided mode: agent executes with confirmations on high-risk steps.
- Autopilot mode: agent executes within guardrails; user reviews logs.
- Per-capability autonomy: allow “auto-tag” but “ask before closing,” etc.

5) Ship trust primitives on day one
- Proof panel: what it did, why, evidence/inputs used, and exceptions.
- Audit log: prompt context, tool calls, timestamps, actor, outcome.
- Undo/rollback: one-click revert for reversible changes.
- Kill switch: disable agent per tenant and globally.

6) Instrument the right metrics
- Task success rate (weekly trend).
- Human intervention rate (HIR) by step.
- Action rollback rate.
- Time-to-value (TTV) delta vs. baseline.
- Cost per successful outcome (CPSO): include inference + retrieval + tool execution + human review.

7) Evaluation and regression testing
- Create 50–200 “golden tasks” that represent real usage.
- Run nightly replays after any prompt/model/tool change.
- Track success/failure reasons (schema errors, permissions, tool timeouts, wrong decisions).
- Establish release gates (e.g., must maintain ≥95% on golden tasks).

8) Rollout plan
- Internal dogfood (1–2 weeks) with verbose logging.
- Beta with 5–20 design partners; weekly review of failures.
- Gradual enablement by tenant, role, and capability.
- Clear user education: what the agent can do, cannot do, and how to revoke autonomy.

9) Economics guardrails
- Model routing: cheap model for classification, premium for high-stakes steps.
- Caching for repeated retrieval and stable context.
- Rate limits per tenant to avoid runaway costs.
- Alerts when CPSO spikes or success rate dips.

Definition of “ready to scale”
- Success rate stable in target range for the workflow.
- HIR below your autonomy threshold (commonly <15% for autopilot).
- Rollback rate low and falling (<2% steady state).
- CPSO fits gross margin assumptions for your pricing tier.
- Security review artifacts ready: permissions model, audit logs, data retention, SOC 2/ISO posture (as applicable).