PRODUCTION AGENT READINESS PACK (PARR) — 2026

Use this pack to move an agent from prototype to production in a way that security, SRE, and finance will approve.

1) SCOPE & SUCCESS METRICS (fill this in first)
- Agent name/version:
- Primary workflow (one sentence):
- Explicit out-of-scope actions (at least 5):
- Target KPI (choose 1–2): cost per ticket, time-to-resolution, backlog reduction %, conversion uplift, MTTA/MTTR reduction
- Baseline numbers (last 30 days):
- Launch target: e.g., 15% TTR reduction in 45 days

2) PERMISSIONS MANIFEST (treat like a service account)
- Environments: dev / staging / prod
- Tool allowlist: list each tool/API the agent can call
- For each tool: read vs write, allowed resources, rate limits
- Secrets: where stored (Vault/KMS), rotation policy, break-glass procedure
- Data access: which PII fields are allowed; redaction rules

3) POLICY GATES (deterministic)
Define “high-risk actions” and the gate type:
- Human approval required (examples): refunds > $500; any pricing plan downgrade; sending emails to customers; production writes
- Policy engine approval required (examples): access to customer records; exporting reports; creating admin users
- Always denied (examples): changing auth roles; disabling logging; bulk export of PII

4) EVAL SUITE (minimum viable)
- Golden set: 50–200 real cases with expected outcomes
- Adversarial set: at least 20 prompt-injection or data-exfil attempts
- Regression cadence: on every model change, tool schema change, or retrieval index update
- Pass/fail thresholds:
  * Schema validity: 99%+
  * Policy compliance: 100%
  * Task success (workflow-specific): set a target, e.g., 85%+

5) OBSERVABILITY & ALERTS
Log (structured): agent_id, session_id, model, retrieved doc IDs, tool names, tool errors, policy decisions, latency, tokens, cost estimate.
Do NOT log: raw PII, full document bodies, secrets.
Alerts (start with 5):
- Spend per run exceeds cap
- Tool-call error rate spikes (e.g., >3% over 10 minutes)
- Policy denials spike (could indicate prompt attack or tool misuse)
- Retry loop detected (same step repeated N times)
- Outcome drift (e.g., resolution category mix changes by >20% week-over-week)

6) FAIL-SAFES & ROLLBACK
- Circuit breakers: external API timeouts, max retries, backoff strategy
- Safe fallback: “propose mode” or handoff to human queue
- Kill switch: who can trigger; where it lives; expected time to disable
- Rollback plan: pin previous model/tool versions; replay last 24h of eval suite

7) LAUNCH CHECKLIST (GO/NO-GO)
GO only if:
- Permissions are least-privilege and reviewed by Security
- Policy gates enforced in code (not prompts)
- Regression + adversarial evals passing thresholds
- Oncall runbook exists and SRE has dashboards/alerts
- Spend caps tested in staging with simulated load
- Clear user comms: what the agent can/can’t do + how to report errors

8) LIGHTWEIGHT ONCALL RUNBOOK TEMPLATE
- Symptoms: what users report
- Immediate checks: dashboards, recent deploys, tool error rates, policy denial spikes
- Triage steps: disable writes → switch to propose mode → disable agent
- Escalations: Security (data), SRE (availability/spend), Product (user impact)
- Postmortem trigger: any unauthorized action, PII exposure, customer-impacting incorrect change, spend anomaly >2x daily cap