AGENTIC FEATURE PRODUCTION READINESS CHECKLIST (2026)

Use this checklist to take an agentic feature from prototype to production safely. It’s designed for founders, PMs, and tech leads shipping workflows that execute actions (not just generate text).

1) WORKFLOW CONTRACT (PRODUCT)
- Define the job: exact start trigger, inputs, and “done” state.
- List allowed tools/actions (e.g., create ticket, update CRM field) and explicitly banned actions.
- Define success metrics: task success rate, correctness/acceptance rate, policy violation rate, median time-to-complete.
- Define failure behavior: fallback to copilot, open a human review task, or no-op with explanation.

2) BUDGETS & UNIT ECONOMICS (FINANCE + ENG)
- Set hard limits per run: max model calls, max tokens, max wall-clock time.
- Implement “fast vs thorough” modes or an auto-router (small model by default, frontier model on low confidence).
- Track cost-per-outcome (e.g., $/resolved ticket) and gross margin at the workflow level.
- Add tenant-level usage caps and alerting for budget anomalies.

3) SAFETY & GOVERNANCE (SECURITY)
- Treat the agent as a non-human identity: separate service account, least-privilege scopes.
- Policy-gate tool calls (allow/deny rules) and require approvals for high-impact actions.
- Add secrets hygiene: short-lived tokens, rotation, no long-lived omnipotent credentials.
- Data minimization: redact PII, store references not raw docs where possible, configurable retention.

4) OBSERVABILITY & AUDITABILITY (PLATFORM)
- Implement an agent “flight recorder”: state, retrieved context IDs, tool calls/args, tool responses, policies evaluated, final actions.
- Correlate every run with a trace/log ID in your standard observability stack.
- Provide an admin/operator console: search runs, view diffs, export logs.

5) EVALUATIONS & RELEASE QUALITY (ML/ENG)
- Create an offline regression suite from real tasks; include adversarial/edge cases.
- Gate releases on thresholds (e.g., JSON validity, policy violation rate).
- Run canaries on a small cohort; compare outcome rates and costs vs baseline.

6) HUMAN-IN-THE-LOOP UX (PRODUCT)
- Use structured change plans and diff previews; avoid vague prose.
- Provide one-click rollback for any mutation; document non-reversible actions.
- Add an escalation path: when confidence is low, require approval or switch to suggestion mode.

7) INCIDENT READINESS (OPS)
- Add a kill switch to disable autonomous execution instantly.
- Create runbooks for: runaway costs, tool outage, policy bypass attempt, customer report of incorrect actions.
- Prepare customer communications templates with clear timelines and remediation steps.

If you can’t answer “What did the agent do, why did it do it, what did it cost, and how do we undo it?” you’re not ready for production.