AGENTIC FEATURE LAUNCH CHECKLIST (2026) 1) ROLE CARD (PRODUCT SPEC) - Agent name + purpose in one sentence (e.g., “Support Triage Teammate: classify, route, and draft responses for inbound tickets”). - Primary workflow: define start/end states and what “done” means. - In-scope tasks (max 3–5) and explicitly out-of-scope tasks. - Authority ladder: Read-only → Draft → Execute w/ approval → Constrained autonomy. - Escalation rules: confidence threshold, missing data, policy conflicts, customer tier, $ amount thresholds. - Success metrics: coverage (% cases handled), precision (correct when acting), time-to-resolution, override rate, incident rate. 2) PERMISSIONS + GOVERNANCE - Identity & access: SSO (Okta/Entra), SCIM, role-based permissions per action. - Data boundaries: what objects/fields can be read/written; tenant isolation guarantees. - Retention: log retention policy, customer-configurable options, deletion/DSR workflow. - Auditability: every action logged with who/what/when, sources used, and rollback path. - Safety controls: kill switch per workspace; rate limits; blast-radius caps (max actions/day). 3) TOOLING + RELIABILITY - Tools are idempotent (retries won’t duplicate refunds/emails/updates). - Tool schemas versioned; breaking changes gated behind tests. - Sandbox mode available; “dry run” outputs include a diff of planned changes. - Undo/rollback implemented for every write action. 4) EVALUATION (BEFORE GA) - Golden set: 200–1,000 real cases with expected outputs; include edge/adversarial examples. - Nightly regression: run against prompt updates, model/provider changes, and tool schema changes. - Metrics dashboard: coverage, precision, p50/p95 latency, cost per task (p50/p95), policy violations. - Human review rubric for sampled outputs (tone, correctness, policy compliance). 5) COST + UNIT ECONOMICS - Define budget per task (e.g., triage $0.05, complex billing $0.25). - Routing policy: small model default; escalate to larger model only on low confidence/high value. - Caching + dedupe plan (retrieval cache, context compression, retry limits). - Metering spec: what counts as an “agent action,” what’s billable, and how customers see usage. 6) LAUNCH PLAN - Shadow mode (1–2 weeks): drafts + logs only; measure precision/coverage. - Approve-to-act beta: limited cohort; batch approvals; track approval fatigue. - Constrained autonomy: confidence thresholds + spend caps + alerts. - Incident playbook: disable switch, rollback steps, customer comms template, postmortem process. 7) PRICING + PACKAGING - Choose a measurable unit tied to value (tickets triaged, invoices reconciled, leads qualified). - Include predictability: monthly included quota + overage, or hard caps with admin controls. - Provide an “AI Usage” page: actions, costs, models used, and who triggered runs. GA EXIT CRITERIA (SUGGESTED) - Precision meets target for the workflow (e.g., ≥90% on golden set) AND policy violations near zero. - Undo/rollback works end-to-end; audit logs complete. - p95 cost per task within budget and gross margin model validated with real usage. - Security review complete: retention, access controls, subprocessors, and audit export verified.