MODEL CONTRACT TEMPLATE (CONTRACT.md) Scope - Feature/workflow name: - User roles involved: - Side effects (emails sent, tickets created, money moved, data written): 1) Allowed Capabilities (Tools) For each tool: - Tool name: - Purpose: - Required parameters (and types): - Allowed targets (e.g., domain allowlist, project allowlist): - Rate limits / budgets (timeouts, max retries, max calls per request): - Idempotency key strategy (what prevents duplicate actions): 2) Forbidden Actions Write “must never” statements that are testable. - Must never perform action X without user confirmation. - Must never call tool Y for role Z. - Must never include or infer sensitive data fields (list them). 3) Data Handling & Provenance - Input data sources the model may see: - Retrieval sources allowed (systems, indexes, collections): - Freshness rule (qualitative if you can’t quantify): - Citation rule (what outputs must include): - Memory rule: what can be stored, where, and how it can be deleted. 4) Refusal & Escalation Define triggers: - When to refuse (policy violations, missing sources, unsafe requests): - When to ask clarifying questions (missing required params, ambiguous intent): - When to escalate to a human queue (high-impact actions, low confidence, policy conflicts): 5) Enforcement Points (Code) List the exact places you enforce the contract: - Server-side tool permission checks (not in the prompt) - JSON Schema validation for tool calls - Retriever filters and metadata propagation - Confirmation gate for high-impact actions - Kill switch flags (per tool, per tenant) 6) Evals (Release Gates) Maintain an eval suite folder with: - regression_cases.jsonl (sanitized incidents turned into tests) - adversarial_cases.jsonl (jailbreak + tool misuse attempts) - rag_provenance_cases.jsonl (requires correct citation or refusal) Gating rule: - CI must fail if any “must never” case passes. - CI must fail if schema validation fails on any tool call. Rollout Checklist - Add tracing for tool calls (with redaction) - Verify logs retention and access control - Enable staged rollout / feature flag - Define incident response: who disables what, and how fast If you can’t point to a failing CI run that blocks deployment when this contract is violated, the contract isn’t real.