AI CONTROL PLANE STARTER CHECKLIST (STARTUPS)

Goal: make LLM features measurable, routable, and governable across providers (OpenAI, Anthropic, Google, AWS Bedrock, and/or open models).

1) ONE FRONT DOOR
- Create a single internal API for all LLM calls (even if it just proxies today).
- Standardize request metadata: tenant_id, user_id (or hashed), intent tag, data_class (public/internal/restricted), and “requires_json” boolean.
- Record model/provider used and a request_id that propagates through your system.

2) ROUTING RULES (MINIMUM VIABLE)
- Define default models per intent: “fast” and “best.”
- Implement fallback rules for: invalid JSON/schema, tool-call failure, safety refusal, rate-limit, timeout.
- Add a kill switch per provider/model (feature flag style).

3) EVALUATIONS YOU CAN RUN WEEKLY
- Pick ONE workflow (extraction, support draft, RAG answer, or agent tool use).
- Create a small golden set of real examples you’re allowed to store.
- Define pass/fail checks that are mechanical where possible:
 * JSON parse + schema validation
 * presence of citations
 * tool call matches allowed schema
 * banned content checks
- Track results by model version and prompt version.

4) OBSERVABILITY + STORAGE
- Capture traces (prompt, retrieved docs IDs, tool calls, outputs) with redaction.
- Store only what you need; set retention by data class.
- Make it searchable by request_id and tenant_id for support.

5) COST AND ABUSE CONTROLS
- Enforce maximum context size and maximum retries.
- Add per-tenant budgets or rate limits for agentic flows.
- Log token usage/cost fields if your provider returns them; otherwise log request size proxies.

6) SECURITY + PROCUREMENT READINESS
- Document where data flows (provider, region, retention).
- Confirm and document provider training/data-use terms for your integration.
- Ensure secrets are in a real secrets manager (e.g., HashiCorp Vault, cloud-native).
- Restrict who can view raw prompts/outputs in logs.

7) RELEASE DISCIPLINE
- Version prompts like code.
- Gate prompt/routing changes behind eval runs.
- Keep a rollback path: previous prompt + previous routing config.

Definition of done: you can switch providers for a workflow by changing routing config, and you can prove quality didn’t regress using your eval suite.