Tool-Native AI Shipping Checklist (MCP + Contracts)

Goal: In 2 weeks, ship one AI workflow that is model-replaceable because tools are standardized, permissioned, and observable.

Scope choice (Day 1)
- Pick ONE workflow with clear success criteria (example: “create a PR from an issue,” “summarize and post a Slack update,” “answer on-call runbook questions with citations”).
- Identify the system of record (GitHub/Jira/Linear/ServiceNow/etc.) and the owner team.
- Define a hard boundary: what the assistant is allowed to change vs read-only.

Tool contract design (Days 2–3)
- Write tool specs with strict input/output schemas. No ambiguous fields (“notes”), no optional meaning.
- Add idempotency where the tool changes state (idempotency key or equivalent).
- Define error taxonomy: user-fixable (4xx), retryable (timeouts/5xx), and “stop and escalate.”
- Require receipts: every mutating tool returns a URL/ID (ticket link, PR link, message permalink).

MCP server build (Days 4–6)
- Implement the tools behind an MCP server inside your security boundary.
- Put auth in front (OAuth scopes or your internal identity system). Enforce least privilege.
- Log every call: user, scope, tool name, request ID, latency, outcome, and the returned receipt.
- Redact sensitive fields in logs (tokens, secrets, message bodies as needed).

Assistant runtime (Days 7–9)
- Connect TWO model providers to the same MCP tool layer (example: OpenAI + Anthropic, or Anthropic + an open-weight model).
- Add routing rules: cheap model for classification/summarization; stronger model for multi-step planning.
- Implement timeouts and retries at the tool layer, not in prompt text.

Verification & evals (Days 10–12)
- Create a small eval set of real tasks (20–50 is fine) with expected tool outcomes.
- Test for: correct tool selection, correct parameters, successful receipt creation, and safe failure.
- Add a “no receipt = no claim” rule in UI: show links/IDs for any action.

Release guardrails (Days 13–14)
- Add a human escalation path with full context attached (inputs, tool attempts, errors).
- Add a kill switch for each tool (disable mutating actions quickly).
- Pin versions: MCP server version, tool schemas, and model configuration.
- Write an incident playbook: what to do if the assistant starts making wrong calls.

Done criteria
- The same workflow works with two different model providers.
- Every state change has a receipt.
- You can answer: who did what, through which tool, with which permission, and when.
- Swapping models does not require rewriting tool code—only configuration and prompts.