AI-NATIVE OPERATING SYSTEM (AI-NOS) — ONE-PAGER TEMPLATE

Use this for every LLM workflow that reaches customers, touches money, or writes to production systems. Keep it to one page. If you can’t, the system is too vague.

1) WORKFLOW NAME + BUSINESS PURPOSE
- Name:
- User impact (what changes for the user?):
- Criticality: (Low / Medium / High)
- What would be “unacceptable” harm? (e.g., wrong refund, data exposure, legal risk)

2) DECISION RIGHTS (WRITE NAMES)
- Product owner (owns success criteria):
- Engineering owner (owns implementation + reliability):
- Security/privacy approver (data + access rules):
- Who can approve a model swap?
- Who can expand permissions/tool access?

3) AUTHORITY LEVEL (WHAT THE MODEL IS ALLOWED TO DO)
- Read access (systems, datasets):
- Write access (systems, fields):
- External actions (email, Slack/Teams, ticket updates, payments):
- Explicitly forbidden actions:

4) INPUT/OUTPUT CONTRACT
- Allowed input data (and prohibited data):
- Redaction rules (what must be removed before prompt):
- Required output format (prefer structured JSON + schema):
- Fallback output on failure (safe default):

5) EVALS (DEFINITION OF “GOOD”)
- What tasks are tested?
- Pass/fail thresholds (qualitative is fine if you can’t quantify yet):
- Test set ownership + refresh cadence:
- What triggers blocking a deploy?

6) OBSERVABILITY + AUDIT
- What is logged (prompt, retrieved context, tool calls, model version, output)?
- Where are logs stored and who can access them?
- Retention period requirements:
- User reporting path for bad outputs:

7) COST CONTROLS
- Budget owner:
- Guardrails: rate limits, max tokens, caching strategy, routing policy
- What prevents infinite loops / repeated tool calls?
- What happens if costs spike (automatic downgrade, disable feature, paging)?

8) SAFETY + SECURITY CONTROLS
- Prompt injection defenses (where enforced):
- Data exfiltration prevention points:
- Human review requirements (which actions require approval?):
- Abuse handling (malicious users, jailbreak attempts):

9) ROLLBACK / KILL SWITCHES (MUST BE REAL)
- Quick disable method (feature flag / config):
- Safer-mode behavior (read-only, no tools, simpler model):
- Escalation path (who is paged / notified):

10) LAUNCH PLAN
- Limited rollout audience:
- Monitoring dashboard link(s):
- First post-launch review date:

If you can’t name owners, define rollback, and describe eval gates, you’re not shipping a product—you’re gambling with production.