LOCAL-FIRST AI SHIPPING CHECKLIST (ONE-WEEK SPIKE PLAN)

Goal
Prove (or falsify) that a real user workflow can run locally with acceptable latency, predictable behavior, and a clear privacy boundary. End the week with a go/no-go decision and a concrete deployment pattern (pure local, local-first with cloud fallback, or cloud-first with local helpers).

Day 1 — Pick the smallest valuable job
- Choose ONE job with a measurable outcome (not “chat”). Examples: extract structured fields, classify, summarize to a fixed schema, rewrite text with constraints.
- Define success output as a JSON schema or a deterministic format.
- Define your hard budgets: acceptable latency on mid-tier hardware, memory ceiling, and offline behavior.

Day 2 — Select a runtime you can ship
- Apple: Core ML. Android: TensorFlow Lite. Cross-platform/desktop: ONNX Runtime or llama.cpp.
- Decide packaging: bundle model in app vs download on first run.
- Decide update path: app release vs separate model artifact.

Day 3 — Implement local inference end-to-end
- Build the full local call path: input → preprocessing → inference → postprocessing → schema validation.
- Add strict output checks. If output fails schema validation, don’t “best effort” it—treat as failure.
- Instrument only non-sensitive signals by default: timing, validation pass/fail, fallback triggered.

Day 4 — Add cloud fallback (minimal context)
- Implement a fallback route triggered by: timeout, low confidence, or validation failure.
- Ensure fallback sends the smallest possible context. Prefer redaction/minimization over raw dumps.
- Add user-visible controls if sensitive data could leave device.

Day 5 — Safety and permissions pass
- List every tool/action the model can trigger (file read, send message, create ticket, run command).
- Default to read-only. Require explicit user confirmation for irreversible actions.
- Sandbox tools. Log actions locally for audit.

Day 6 — Hardware variance test
- Test on at least two device classes (e.g., older laptop + newer laptop, mid Android + flagship).
- Verify thermals and background load (video call, many browser tabs) don’t break UX.
- Add degrade mode: smaller context, simpler prompts, reduced features, or forced fallback.

Day 7 — Decide and write the spec
- Choose the deployment pattern:
 (A) Pure local
 (B) Local-first + cloud fallback
 (C) Cloud-first + local helpers
- Write the data boundary policy: what never leaves the device, what might, and how users control it.
- Write the release plan: artifact signing, rollback strategy, and model version pinning.

Go/No-Go Criteria (use as-is)
GO if:
- The workflow meets your latency budget locally for common cases.
- Output is consistently valid against your schema.
- Fallback improves quality without sending unnecessary data.
- You have an update/rollback path for the model artifact.

NO-GO if:
- You need cloud calls for the common case.
- Output cannot be constrained to safe, verifiable formats.
- You can’t explain (in one paragraph) what data leaves the device and why.