LOCAL-FIRST AI SHIPPING CHECKLIST (ONE-WEEK SPIKE PLAN) Goal Prove (or falsify) that a real user workflow can run locally with acceptable latency, predictable behavior, and a clear privacy boundary. End the week with a go/no-go decision and a concrete deployment pattern (pure local, local-first with cloud fallback, or cloud-first with local helpers). Day 1 — Pick the smallest valuable job - Choose ONE job with a measurable outcome (not “chat”). Examples: extract structured fields, classify, summarize to a fixed schema, rewrite text with constraints. - Define success output as a JSON schema or a deterministic format. - Define your hard budgets: acceptable latency on mid-tier hardware, memory ceiling, and offline behavior. Day 2 — Select a runtime you can ship - Apple: Core ML. Android: TensorFlow Lite. Cross-platform/desktop: ONNX Runtime or llama.cpp. - Decide packaging: bundle model in app vs download on first run. - Decide update path: app release vs separate model artifact. Day 3 — Implement local inference end-to-end - Build the full local call path: input → preprocessing → inference → postprocessing → schema validation. - Add strict output checks. If output fails schema validation, don’t “best effort” it—treat as failure. - Instrument only non-sensitive signals by default: timing, validation pass/fail, fallback triggered. Day 4 — Add cloud fallback (minimal context) - Implement a fallback route triggered by: timeout, low confidence, or validation failure. - Ensure fallback sends the smallest possible context. Prefer redaction/minimization over raw dumps. - Add user-visible controls if sensitive data could leave device. Day 5 — Safety and permissions pass - List every tool/action the model can trigger (file read, send message, create ticket, run command). - Default to read-only. Require explicit user confirmation for irreversible actions. - Sandbox tools. Log actions locally for audit. Day 6 — Hardware variance test - Test on at least two device classes (e.g., older laptop + newer laptop, mid Android + flagship). - Verify thermals and background load (video call, many browser tabs) don’t break UX. - Add degrade mode: smaller context, simpler prompts, reduced features, or forced fallback. Day 7 — Decide and write the spec - Choose the deployment pattern: (A) Pure local (B) Local-first + cloud fallback (C) Cloud-first + local helpers - Write the data boundary policy: what never leaves the device, what might, and how users control it. - Write the release plan: artifact signing, rollback strategy, and model version pinning. Go/No-Go Criteria (use as-is) GO if: - The workflow meets your latency budget locally for common cases. - Output is consistently valid against your schema. - Fallback improves quality without sending unnecessary data. - You have an update/rollback path for the model artifact. NO-GO if: - You need cloud calls for the common case. - Output cannot be constrained to safe, verifiable formats. - You can’t explain (in one paragraph) what data leaves the device and why.