DIFF-FIRST PRODUCT SPEC TEMPLATE (2026)

Goal
Design an AI feature whose value persists even if you swap model providers (OpenAI/Anthropic/Google/open-weight) and whose core output is an auditable decision trail.

1) Define the Artifact
- Primary artifact type: (ticket / PR / contract clause / incident report / sales email / knowledge base article)
- Where it lives today: (Jira, GitHub, Google Docs, Salesforce, ServiceNow, internal app)
- Who owns it: (role + team)
- What “done” means for the artifact (explicit acceptance criteria)

2) Define the Diff Surface
- What the model proposes (structured fields, not just free text):
 - e.g., Summary, Recommended action, Risk flags, Citations, Next steps
- What humans routinely edit today (list the top 5 edit types)
- What must never be auto-approved (hard boundary)

3) Decision Events (Immutable Log)
For each AI interaction, log events such as:
- DRAFT_CREATED (model output + metadata)
- EDIT_APPLIED (user diff)
- CITATION_ADDED/REMOVED (source pointer)
- APPROVED / REJECTED (decision + reason)
- ESCALATED (to whom + why)
Each event should include: org_id, actor_id, artifact_id, timestamp, model_provider, model_name, prompt/version identifier, input references, output snapshot, diff snapshot.

4) Provenance & Sources
- Allowed source systems (explicit list)
- How citations are represented (URL, doc ID, CRM field path, log query)
- What happens when a source is missing/permissioned

5) Policy Layer
- Organization rules that can be expressed as checks (examples):
 - banned claims/phrases
 - required disclaimers
 - approval required above a threshold (risk, dollars, customer tier)
 - PII handling constraints
- Where policy is enforced: before generation, after generation, or before approval

6) Evaluation Plan (Workflow-Embedded)
Define 3 signals that are naturally produced after work completes:
- outcome signal (win/loss, reopen, audit finding, incident severity correction)
- time signal (cycle time, handoff count)
- quality signal (review comments, rejection reasons)
Attach evaluation to those moments, not a separate dashboard.

7) Model Portability Requirements
- Must support at least two providers OR a router layer (e.g., LiteLLM/OpenRouter-style abstraction)
- Prompts/tools are versioned and tested
- No core business logic inside provider-specific features only

8) Pricing Unit (Not Tokens)
Choose a billing unit that matches the artifact:
- per resolved ticket, per reviewed contract, per merged PR with checks, per processed case
Document: who owns the budget and what internal KPI it maps to.

Definition of Done
You can reproduce any approved output, show the exact human diff and approval path, and swap models without changing what your customer pays for.