PRODUCTION AGENT READINESS PACK (PARR) — 2026 Use this pack to move an agent from prototype to production in a way that security, SRE, and finance will approve. 1) SCOPE & SUCCESS METRICS (fill this in first) - Agent name/version: - Primary workflow (one sentence): - Explicit out-of-scope actions (at least 5): - Target KPI (choose 1–2): cost per ticket, time-to-resolution, backlog reduction %, conversion uplift, MTTA/MTTR reduction - Baseline numbers (last 30 days): - Launch target: e.g., 15% TTR reduction in 45 days 2) PERMISSIONS MANIFEST (treat like a service account) - Environments: dev / staging / prod - Tool allowlist: list each tool/API the agent can call - For each tool: read vs write, allowed resources, rate limits - Secrets: where stored (Vault/KMS), rotation policy, break-glass procedure - Data access: which PII fields are allowed; redaction rules 3) POLICY GATES (deterministic) Define “high-risk actions” and the gate type: - Human approval required (examples): refunds > $500; any pricing plan downgrade; sending emails to customers; production writes - Policy engine approval required (examples): access to customer records; exporting reports; creating admin users - Always denied (examples): changing auth roles; disabling logging; bulk export of PII 4) EVAL SUITE (minimum viable) - Golden set: 50–200 real cases with expected outcomes - Adversarial set: at least 20 prompt-injection or data-exfil attempts - Regression cadence: on every model change, tool schema change, or retrieval index update - Pass/fail thresholds: * Schema validity: 99%+ * Policy compliance: 100% * Task success (workflow-specific): set a target, e.g., 85%+ 5) OBSERVABILITY & ALERTS Log (structured): agent_id, session_id, model, retrieved doc IDs, tool names, tool errors, policy decisions, latency, tokens, cost estimate. Do NOT log: raw PII, full document bodies, secrets. Alerts (start with 5): - Spend per run exceeds cap - Tool-call error rate spikes (e.g., >3% over 10 minutes) - Policy denials spike (could indicate prompt attack or tool misuse) - Retry loop detected (same step repeated N times) - Outcome drift (e.g., resolution category mix changes by >20% week-over-week) 6) FAIL-SAFES & ROLLBACK - Circuit breakers: external API timeouts, max retries, backoff strategy - Safe fallback: “propose mode” or handoff to human queue - Kill switch: who can trigger; where it lives; expected time to disable - Rollback plan: pin previous model/tool versions; replay last 24h of eval suite 7) LAUNCH CHECKLIST (GO/NO-GO) GO only if: - Permissions are least-privilege and reviewed by Security - Policy gates enforced in code (not prompts) - Regression + adversarial evals passing thresholds - Oncall runbook exists and SRE has dashboards/alerts - Spend caps tested in staging with simulated load - Clear user comms: what the agent can/can’t do + how to report errors 8) LIGHTWEIGHT ONCALL RUNBOOK TEMPLATE - Symptoms: what users report - Immediate checks: dashboards, recent deploys, tool error rates, policy denial spikes - Triage steps: disable writes → switch to propose mode → disable agent - Escalations: Security (data), SRE (availability/spend), Product (user impact) - Postmortem trigger: any unauthorized action, PII exposure, customer-impacting incorrect change, spend anomaly >2x daily cap