ICMD 2026 STARTUP EVALUATION SCORECARD

Purpose: Use this scorecard to evaluate early-stage startups (Seed to Series B) across AI agents, climate tech, and developer tools. The goal is to predict “time-to-trust”: how quickly a skeptical enterprise can adopt the product safely, measure ROI, and expand usage.

How to use:
- Score each category 1–5 (1 = weak/unproven, 5 = strong/proven).
- Total score out of 25.
- A “watchlist” company typically scores 18+ or has a single 5/5 in a category that creates a wedge (e.g., distribution or compliance).

1) WORKFLOW FIT (1–5)
Ask:
- What is the single best workflow? Who owns it (role + department)?
- Is there a clear budget line (support ops, security ops, legal ops, platform engineering, energy/plant ops)?
- What is the measurable KPI improvement target?
Strong evidence:
- 2–3 customer case studies with quantified outcomes (e.g., 15% handle-time reduction; 30% fewer P1 incidents; 10% energy cost reduction).

2) TRUST STACK: SECURITY + GOVERNANCE (1–5)
Ask:
- SOC 2 plan (Type I / Type II dates)? Data retention and deletion?
- Audit logs: are tool calls recorded with inputs/outputs and identities?
- Permissions: least privilege, scoped tokens, approval gates for destructive actions?
Strong evidence:
- SOC 2 Type I done, Type II scheduled within 6–9 months; documented incident response process.

3) RELIABILITY + EVALUATIONS (1–5)
Ask:
- What evals exist today (offline tests, regression tests, canaries)?
- How do they roll back a bad prompt/model/tool change?
- How do they detect failure modes (hallucinations, policy violations, tool errors)?
Strong evidence:
- Automated eval suite run on every change; canary deployments; tracing/observability built-in.

4) UNIT ECONOMICS PATH (1–5)
Ask:
- What is cost per task/run? Who pays variable costs (model calls, infra, energy)?
- For software: is 70%+ gross margin plausible at scale?
- For climate/hardware: is the solution financeable (warranties, O&M plan, performance guarantees)?
Strong evidence:
- Clear margin model with levers (caching, batching, routing); for hardware, named finance/deployment partners.

5) DISTRIBUTION WEDGE + MOAT TRAJECTORY (1–5)
Ask:
- What makes adoption compound (integrations, proprietary data, switching costs)?
- Is there a channel: ecosystems, marketplaces, developer-led growth, regulated buyers?
- What becomes harder to copy after 12–18 months?
Strong evidence:
- Deep integrations into systems of record; proprietary datasets or process data rights; expansion playbook.

INTERPRETING SCORES
- 22–25: Category leader potential; likely to become a platform.
- 18–21: Strong watchlist; validate one missing piece (often compliance or economics).
- 14–17: Promising but risky; needs clearer workflow fit or distribution.
- <14: Likely a demo without a durable adoption path.

Bonus checks (quick red-flag filter):
- “We’ll add compliance later” for enterprise workflows.
- No rollback/evals for agent systems.
- Economics depend only on model costs falling.
- Heavy custom services required for every deployment.

Outcome: You should be able to summarize the company in one sentence: “They reduce X cost/risk by Y% for Z buyer, and they’re trusted because of A (security), B (reliability), and C (distribution wedge).”