Tool-Native AI Shipping Checklist (MCP + Contracts) Goal: In 2 weeks, ship one AI workflow that is model-replaceable because tools are standardized, permissioned, and observable. Scope choice (Day 1) - Pick ONE workflow with clear success criteria (example: “create a PR from an issue,” “summarize and post a Slack update,” “answer on-call runbook questions with citations”). - Identify the system of record (GitHub/Jira/Linear/ServiceNow/etc.) and the owner team. - Define a hard boundary: what the assistant is allowed to change vs read-only. Tool contract design (Days 2–3) - Write tool specs with strict input/output schemas. No ambiguous fields (“notes”), no optional meaning. - Add idempotency where the tool changes state (idempotency key or equivalent). - Define error taxonomy: user-fixable (4xx), retryable (timeouts/5xx), and “stop and escalate.” - Require receipts: every mutating tool returns a URL/ID (ticket link, PR link, message permalink). MCP server build (Days 4–6) - Implement the tools behind an MCP server inside your security boundary. - Put auth in front (OAuth scopes or your internal identity system). Enforce least privilege. - Log every call: user, scope, tool name, request ID, latency, outcome, and the returned receipt. - Redact sensitive fields in logs (tokens, secrets, message bodies as needed). Assistant runtime (Days 7–9) - Connect TWO model providers to the same MCP tool layer (example: OpenAI + Anthropic, or Anthropic + an open-weight model). - Add routing rules: cheap model for classification/summarization; stronger model for multi-step planning. - Implement timeouts and retries at the tool layer, not in prompt text. Verification & evals (Days 10–12) - Create a small eval set of real tasks (20–50 is fine) with expected tool outcomes. - Test for: correct tool selection, correct parameters, successful receipt creation, and safe failure. - Add a “no receipt = no claim” rule in UI: show links/IDs for any action. Release guardrails (Days 13–14) - Add a human escalation path with full context attached (inputs, tool attempts, errors). - Add a kill switch for each tool (disable mutating actions quickly). - Pin versions: MCP server version, tool schemas, and model configuration. - Write an incident playbook: what to do if the assistant starts making wrong calls. Done criteria - The same workflow works with two different model providers. - Every state change has a receipt. - You can answer: who did what, through which tool, with which permission, and when. - Swapping models does not require rewriting tool code—only configuration and prompts.