The most expensive mistake in software in 2026 isn’t “we shipped the wrong thing.” It’s “we let the model decide what the thing is.”
Teams are using ChatGPT, Claude, Gemini, and GitHub Copilot as if they’re high-output interns: draft a PRD, sketch an architecture, generate tickets, write tests, open a PR, repeat. The velocity looks real. The quality feels fine—until the product becomes a pile of plausible features that don’t cohere, don’t meet regulatory constraints, don’t respect platform rules, and don’t match how users actually behave.
Leadership’s job has shifted. When the cost of producing code drops, the value moves to the constraints: what you will not build, how you’ll measure “good,” what must be true before anything ships, and which risks you’re willing to own in public.
The new bottleneck isn’t engineers. It’s coherence.
AI tools are great at producing local correctness: a function that compiles, a UI that looks reasonable, a query that returns something. They are bad at global coherence: a product that behaves consistently across surfaces, an onboarding path that doesn’t contradict your pricing, a permissions model that won’t explode during an audit, a support workflow that doesn’t create its own incident queue.
Leaders keep asking, “How do we get more output from engineering?” Wrong question. Your systems already generate output. The question is: “How do we prevent output that increases future work?”
The answer isn’t a motivational speech about craftsmanship. It’s written constraints. In the AI era, the spec is the product—because the spec is what your models read, what your humans follow, and what your organization uses to argue about reality.
Stop treating PRDs as paperwork. Start treating them as executable constraints.
The PRD died once already, in the agile era, when teams confused “working software” with “no thinking.” AI resurrected the PRD—but in a more dangerous form: an auto-generated document that looks complete and isn’t.
A useful spec in 2026 is less narrative and more constraint system. It reads like a contract between product, engineering, design, legal, security, and support. It reduces ambiguity where ambiguity is expensive (permissions, data retention, pricing, support boundaries). It preserves ambiguity where ambiguity is productive (visual exploration, copy, experiment variants).
If you want a north star, steal from Amazon’s long-standing practice of writing a press release and FAQ before building (the “PR/FAQ”). The point isn’t the format; it’s the discipline of committing to a customer-facing story and then forcing every requirement to support it.
Key Takeaway
If your spec can’t be used to reject work, it’s not a spec. It’s a vibe.
What “AI PRDs” get wrong
Generated PRDs tend to overfit to what the model has seen: generic user stories, overbroad scopes, and “non-functional requirements” copied from templates. They also under-specify the sharp edges: failure modes, rollout controls, abuse cases, regulatory obligations, and how support actually handles broken states.
Leaders should assume any auto-generated PRD is missing the only parts that matter.
Table 1: Comparison of spec artifacts in AI-heavy product teams
| Artifact | Best for | Failure mode | Where it lives |
|---|---|---|---|
| Amazon-style PR/FAQ | Forcing customer-visible clarity early | Marketing story replaces hard constraints | Doc (internal wiki / doc tool) |
| One-page “constraint spec” | Guardrails: permissions, data, rollout, SLOs, legal | Too thin on UX flow; becomes policy-only | Repo + doc, linked to tickets |
| RFC (engineering-led) | Technical decisions, tradeoffs, interfaces | Optimizes for elegance over outcomes | Repo (Markdown) + review comments |
| OpenAPI / JSON Schema | Contract-first API and validation | Teams ship schema-compliant nonsense | Repo (versioned) |
| Prototype-first (Figma) | UX convergence, interaction clarity | Ignores data lifecycle, security, ops reality | Design tool + linked tickets |
Leadership move: own the “policy surface area” before you ship features
Every serious product is a set of policies disguised as UI. Who can see what. Who can export what. What gets logged. What gets retained. What gets deleted. What happens when an employee leaves. What counts as “admin.” Which actions require re-authentication. What happens on a chargeback. How appeals work. How you respond to a subpoena. Engineers implement these policies. Leaders decide them—whether they admit it or not.
If you don’t decide them explicitly, they get decided implicitly: by whichever model wrote the draft ticket, by whichever engineer merged first, or by whichever support agent creates the least painful workaround.
This matters more because of regulation pressure that isn’t going away. The EU AI Act is now real law. GDPR enforcement never stopped. US states kept passing privacy laws. Your “AI feature” might be a data governance feature wearing a nicer outfit.
"You build it, you run it." — a principle popularized in modern DevOps and associated with Amazon’s operational culture
That idea aged well. In 2026, it extends to “you spec it, you own it.” If leadership signs off on a vague spec, leadership is signing up for a vague incident.
AI makes code review less important than change review
Classic engineering leadership behavior: obsess over code review quality, style consistency, and cleverness. That made sense when writing code was the expensive act. Now the expensive act is changing the system in a way that breaks user trust, compliance posture, or operational stability.
AI will happily produce a clean diff that introduces a privacy regression, an authorization bypass, or an irreversible data migration. Your leaders need to move attention up a level: from “is this code good?” to “is this change acceptable?”
The diff isn’t the unit of risk; the capability is
A small change can create a large capability: bulk export, admin impersonation, silent background sync, token minting, cross-tenant search. These are not “features,” they’re power. Power needs policy and audit.
Teams that get this right tend to formalize a review layer that looks more like a product-security-legal triage than a style check. Not bureaucracy for its own sake—just honest acknowledgment that capability changes can be existential.
- Capability inventory: a living list of actions that change data visibility, data movement, or monetary outcomes.
- Approval rules: which roles must sign off on which capabilities (e.g., security for export, finance for billing changes).
- Default deny: new endpoints/features ship behind flags with explicit enablement paths.
- Audit hooks: what gets logged and where; treat logs as a product surface.
- Rollback story: if it breaks, what’s the first safe state? “Revert” is not a strategy for data changes.
A practical pattern: “spec-to-flag-to-log”
This is a boring chain that prevents exciting disasters: spec the capability, ship it behind a flag, instrument logs before rollout. It forces you to write down intent, control exposure, and create visibility.
# Example: feature-flagged rollout using OpenAI's API (conceptual)
# Assumes flags are stored in your config service and evaluated server-side.
if flags.enabled("bulk_export_v1", org_id):
export = create_export_job(user_id, org_id)
audit_log.write(
action="bulk_export_requested",
actor=user_id,
org=org_id,
target=export.id,
)
else:
raise PermissionError("Feature not enabled")
This snippet isn’t about any one vendor. It’s about the muscle memory: flag, audit, and only then broaden access.
Tooling reality: your models are already in the company—govern them like employees
Most teams already route sensitive context into third-party systems: issue trackers, customer tickets, logs, analytics. AI assistants are now part of that data flow. Pretending you can ban them is fantasy; if you ban them, people will use them anyway in less visible ways.
Leadership should treat model access like workforce access: define what data classes can be used, which tools are approved, how retention works, and what must never be pasted into a prompt.
OpenAI, Anthropic, Google, Microsoft, and AWS all offer enterprise-oriented plans and controls, but the shape of the problem is the same: your organization is creating a second channel where sensitive context can travel.
Table 2: A leadership checklist for governing AI-assisted development (reference)
| Decision | Options (real examples) | Default stance | Owner |
|---|---|---|---|
| Approved assistants | ChatGPT Enterprise, Microsoft Copilot, Claude for Enterprise, Gemini for Workspace | Small approved set, centralized procurement | CIO/CTO + Security |
| Source code boundaries | Allow in private repos only; block in regulated repos; use GitHub Copilot Business policies | Explicit allowlist by repo sensitivity | Eng leadership + Security |
| Customer data in prompts | Disallow raw PII; allow synthetic examples; require redaction tooling | No raw customer PII outside approved workflows | Privacy + Support ops |
| Retention & audit | Vendor enterprise retention controls; internal logging of assistant usage metadata | Log usage events; document retention posture | Security + Legal |
| Model-output verification | Mandatory tests; static analysis; threat modeling for risky capabilities | Higher scrutiny for auth/data/billing paths | Staff eng + Product |
The contrarian leadership bet: slow down the start to speed up the finish
AI makes early progress look deceptively good. A demo appears in days. Stakeholders applaud. The team commits to a date. Then the costs arrive: permissions, migrations, incident response, weird edge cases, docs, support training, enterprise requirements, and the slow grind of “make it consistent everywhere.”
So here’s the bet: disciplined teams will look slower in week one and faster in month three. They’ll spend the opening phase writing constraints, enumerating failure modes, and deciding policy. They will treat “definition of done” as a leadership artifact, not an engineering footnote.
A sequence that works in practice
- Write the constraint spec: the non-negotiables (data, permissions, compliance posture, rollout).
- Pick the single success metric you can observe without self-deception (not “engagement” if the feature creates spammy loops).
- Define the kill switch: what you’ll turn off first when things go weird.
- Ship to internal users with real data and real workflows, not staged demos.
- Expand via flags while watching logs that reflect the risks you named in step one.
None of this is glamorous. That’s why it’s leadership work, not a hackathon.
A question worth sitting with before your next “AI sprint”
If you let a model draft your roadmap, your PRD, your architecture, your tickets, and half your code, what exactly is your organization’s competitive advantage?
There is a good answer, and it isn’t “speed.” It’s taste expressed as constraints: knowing what matters, naming the risks, writing the policies, and committing to tradeoffs in public inside the company. The next time someone asks for faster shipping, hand them a blank constraint spec and ask them to fill the first line: what must never happen?