ICMD 2026 STARTUP EVALUATION SCORECARD Purpose: Use this scorecard to evaluate early-stage startups (Seed to Series B) across AI agents, climate tech, and developer tools. The goal is to predict “time-to-trust”: how quickly a skeptical enterprise can adopt the product safely, measure ROI, and expand usage. How to use: - Score each category 1–5 (1 = weak/unproven, 5 = strong/proven). - Total score out of 25. - A “watchlist” company typically scores 18+ or has a single 5/5 in a category that creates a wedge (e.g., distribution or compliance). 1) WORKFLOW FIT (1–5) Ask: - What is the single best workflow? Who owns it (role + department)? - Is there a clear budget line (support ops, security ops, legal ops, platform engineering, energy/plant ops)? - What is the measurable KPI improvement target? Strong evidence: - 2–3 customer case studies with quantified outcomes (e.g., 15% handle-time reduction; 30% fewer P1 incidents; 10% energy cost reduction). 2) TRUST STACK: SECURITY + GOVERNANCE (1–5) Ask: - SOC 2 plan (Type I / Type II dates)? Data retention and deletion? - Audit logs: are tool calls recorded with inputs/outputs and identities? - Permissions: least privilege, scoped tokens, approval gates for destructive actions? Strong evidence: - SOC 2 Type I done, Type II scheduled within 6–9 months; documented incident response process. 3) RELIABILITY + EVALUATIONS (1–5) Ask: - What evals exist today (offline tests, regression tests, canaries)? - How do they roll back a bad prompt/model/tool change? - How do they detect failure modes (hallucinations, policy violations, tool errors)? Strong evidence: - Automated eval suite run on every change; canary deployments; tracing/observability built-in. 4) UNIT ECONOMICS PATH (1–5) Ask: - What is cost per task/run? Who pays variable costs (model calls, infra, energy)? - For software: is 70%+ gross margin plausible at scale? - For climate/hardware: is the solution financeable (warranties, O&M plan, performance guarantees)? Strong evidence: - Clear margin model with levers (caching, batching, routing); for hardware, named finance/deployment partners. 5) DISTRIBUTION WEDGE + MOAT TRAJECTORY (1–5) Ask: - What makes adoption compound (integrations, proprietary data, switching costs)? - Is there a channel: ecosystems, marketplaces, developer-led growth, regulated buyers? - What becomes harder to copy after 12–18 months? Strong evidence: - Deep integrations into systems of record; proprietary datasets or process data rights; expansion playbook. INTERPRETING SCORES - 22–25: Category leader potential; likely to become a platform. - 18–21: Strong watchlist; validate one missing piece (often compliance or economics). - 14–17: Promising but risky; needs clearer workflow fit or distribution. - <14: Likely a demo without a durable adoption path. Bonus checks (quick red-flag filter): - “We’ll add compliance later” for enterprise workflows. - No rollback/evals for agent systems. - Economics depend only on model costs falling. - Heavy custom services required for every deployment. Outcome: You should be able to summarize the company in one sentence: “They reduce X cost/risk by Y% for Z buyer, and they’re trusted because of A (security), B (reliability), and C (distribution wedge).”