Most teams are shipping “AI features” like they’re UI widgets. That’s backward. In 2026, the product risk isn’t that your model is wrong. It’s that you quietly shipped a new kind of operator into your system—one that takes actions—without giving the business the controls it would demand for any other operator.
If your AI can create tickets, change customer data, run SQL, push code, issue refunds, publish marketing copy, or contact leads, you didn’t add a feature. You hired a junior employee and gave them API keys. And most products still treat that like an “integration.”
The industry already knows where this goes. In March 2023, an engineer at Google described a bug in an internal AI tool that suggested staff could view another employee’s calendar; Google said it fixed the issue. In the same month, OpenAI temporarily disabled ChatGPT’s “Browse” feature after it could be used to retrieve paywalled content, calling it a problem with how the tool displayed content. Those are not “model quality” stories. They’re product control stories.
“Complexity is anything related to the structure of a system that makes it hard to understand and modify.” — John Ousterhout
AI adds complexity because it introduces non-determinism plus delegated action. Your product needs new primitives. Not “prompt templates.” Primitives.
Autonomy is the new surface area (and you’re probably measuring the wrong thing)
Classic product metrics—activation, retention, task completion—don’t capture what matters once a system can act. The new surface area is: what the AI is allowed to do, on whose behalf, using which data, with what trace, and how quickly you can undo it.
That’s why the interesting product work in 2024–2026 happened in the boring places: identity, policy, audit logs, and connectors. Microsoft put Copilot into Microsoft 365, but the enterprise story hinged on Microsoft Purview, tenant controls, and compliance boundaries. Salesforce pushed Einstein Copilot and then leaned hard into “Trust Layer” messaging and admin controls. Atlassian’s Rovo and “Atlassian Intelligence” rolled out inside permissioned work graphs. The pattern is consistent: vendors realized the product isn’t the chat box. It’s governance at the point of action.
Founders still copy the chat box because it demos well. But chat is the least important part of an agentic product. Chat is just the remote control.
The three primitives that separate “AI toy” from “AI product”
If your AI can take actions, your product needs three things that feel more like security engineering than “product”: permissioning, provenance, and a kill switch. Not as a slide. As first-class UX and API.
1) Permissioning: the AI must act as someone, not as “the app”
Real enterprises already solved this for humans: RBAC, groups, SSO, SCIM, conditional access. Your AI should not bypass those controls by operating under a shared service account. Yet plenty of “agents” still run on a single integration token because it’s easier.
Make the AI assume an identity that maps to an actual user or a tightly-scoped system role. That means:
- Per-user authorization to downstream systems (Google Workspace, Microsoft Graph, GitHub, Jira, Salesforce, Slack) rather than a single omnipotent token.
- Scoped permissions tied to specific tools/actions (read-only vs write, create vs delete).
- Environment boundaries (prod vs sandbox) the AI can’t cross because a prompt asked nicely.
- Time-bounded access where the product forces re-auth for sensitive actions.
- Approval policies for high-impact actions (refunds, payouts, mass email, data exports).
Tools like Okta and Microsoft Entra ID exist because identity is hard. Stop pretending your agent is special.
2) Provenance: every output needs a supply chain
When an AI drafts a customer response, writes a doc, or updates a record, the business will ask: “Where did that come from?” Not philosophically. Operationally. Which sources, which permissions, which time window, which connector, which model, which tool calls, and what was redacted?
If you can’t answer that inside the product, you’re shipping something that can’t be audited. That blocks serious adoption in regulated industries—and it should.
In 2024, OpenAI, Anthropic, Google, and Microsoft all pushed more structured tool use and enterprise controls. Meanwhile, open-source teams shipped inspection and tracing patterns around LLM calls. The direction is obvious: provenance becomes a standard expectation the way “version history” became expected in docs.
3) Kill switch: reversibility is a feature, not an incident response plan
Every system that can act at scale needs fast shutdown and rollback. That’s true for payments, email, deployments, and data pipelines. Agentic features are in the same class. Your product should support:
- Global disable of agent actions (not just the UI) with immediate effect.
- Connector-level disable (turn off Salesforce writes but keep reads; disable GitHub merges but keep PR comments).
- Per-policy disable (block “external email” actions during an incident).
- Action rollback where possible (or at least compensating actions).
- Human checkpointing for irreversible actions.
“We’ll monitor it” is not a control. It’s a hope.
Stop picking a model. Start picking an execution model.
Most product debates still start with “Which LLM should we use?” That’s procurement. The product decision is your execution model: where reasoning happens, where data is retrieved, where actions run, and where you log what happened.
Here’s a useful way to compare the main approaches teams actually ship with in 2026. Notice how little of this is about “prompting.”
Table 1: Comparison of common AI execution patterns in shipped products
| Approach | Best for | Strength | Sharp edge |
|---|---|---|---|
| Chat + retrieval (RAG) inside your app | Q&A over docs, support deflection, internal search | Fast to ship; mostly read-only | Weak audit story if sources/permissions aren’t enforced; “answer drift” over time |
| Tool-using assistant (function calling) with bounded actions | Create/update workflow objects: tickets, tasks, CRM records | Deterministic action surface; easier policy enforcement | Temptation to over-scope tools; failures look like product bugs |
| Autonomous agent loop (plan/act/reflect) | Long-running tasks across systems (ops runbooks, research, multi-step changes) | Handles messy tasks without hand-built flows | Hard to bound; needs strong kill switch, budgets, and traceability |
| Human-in-the-loop agent (approvals + drafts) | Regulated domains; high-impact comms; finance and HR | High safety; clear accountability | Slower; users may route around it if UX is heavy |
| Enterprise suite copilot (e.g., Microsoft Copilot, Google Gemini for Workspace) | Cross-app productivity inside one vendor’s stack | Native permissions and admin controls (best-in-class in-suite) | Limited visibility/control for third-party SaaS; hard to differentiate if you’re not the platform |
The contrarian move: pick the most boring execution model that still delivers the user outcome. If your product can win with bounded tools and approvals, don’t race to “fully autonomous.” Autonomy is not a virtue. It’s a liability you accept because it buys something specific.
The connector tax is the real AI tax
Every founder wants to talk about models. Buyers want to talk about connectors. Because the value is behind the firewall: Google Drive, SharePoint, Confluence, Jira, ServiceNow, Salesforce, SAP, Snowflake, Databricks, GitHub, Slack, Microsoft Teams. If you can’t connect cleanly—and keep permissions intact—you don’t have an AI product. You have a demo.
This is why “enterprise search” vendors matter again (Glean is the obvious example) and why platform vendors keep tightening their own ecosystems (Microsoft Graph, Google Workspace APIs). It’s also why open-source orchestration (LangChain), structured extraction (Pydantic), and vector stores (Pinecone, Weaviate, Milvus) became table stakes in the first wave: teams were trying to stitch together a data plane quickly. In the second wave, stitching isn’t enough; you need governance that survives audit.
Key Takeaway
If you can’t describe, in one sentence, how permissions propagate from the source system to your AI output and then to an action—your product will stall in security review.
What “permission-preserving” actually means
Teams often claim this and then quietly do something else. Permission-preserving means your retrieval layer and your action layer both respect the same identity context:
- Retrieval queries filter by the requesting user’s access (not just “org access”).
- Embeddings and indexes don’t become a backdoor for data a user couldn’t read in the source system.
- Cached AI outputs are scoped and expire like the underlying data.
- Actions in downstream tools run under that same user (or an explicit, constrained service role) with logs.
Designing policy the way SRE designs reliability: budgets, gates, and traces
Agentic products need something like an SRE mindset: define what failure looks like, then design budgets and controls around it. Not because regulators told you to. Because your own system will produce weird edge cases at the worst time.
Budgets: cap blast radius before you need heroics
Budgets are not just about compute cost. They’re about limiting how much the agent can do before a human looks. Think in terms users already understand:
- Time budget: how long an agent can run before it must checkpoint.
- Action budget: how many writes, emails, or tickets it can create in one run.
- Scope budget: which accounts/projects/repos it can touch.
- Data budget: which datasets or document collections it can access.
Gates: approvals are not a failure; they’re product-market fit
Founders hate approvals because approvals reduce the “wow.” Buyers love approvals because approvals map to how companies actually work. If your agent can change a Salesforce record, you can put a gate in front of “mass update,” “stage change,” or “close won.” That’s not fear. That’s governance.
The trick is to make approvals low-friction: show the diff, show the sources, show the policy rule that triggered the gate, and make it one click to accept or reject.
Traces: observability for decisions, not just latency
Traditional logs tell you request/response and timing. Agent traces must tell you: what it believed, what it saw, what it tried, and what changed. If you build on OpenTelemetry concepts, great; if you build something custom, fine. The product requirement is consistent: an operator should be able to answer “why did it do that?” without guessing.
# Example: minimal “agent action” log shape (store in your event pipeline)
{
"timestamp": "2026-05-14T18:22:11Z",
"actor": {"type": "ai_agent", "agent_id": "support-agent", "run_id": "run_01"},
"on_behalf_of": {"user_id": "u_123", "workspace_id": "w_456"},
"inputs": {"ticket_id": "INC-1082"},
"retrieval": [{"source": "confluence", "doc_id": "KB-77", "permission": "user"}],
"decision": {"intent": "issue_refund", "confidence": "n/a", "policy": "refunds_require_approval"},
"action": {"tool": "stripe", "operation": "create_refund", "status": "blocked_pending_approval"},
"artifacts": {"proposed_change": "refund $X", "diff": "..."}
}
Build the admin product like it’s the product (because it is)
Most AI features fail after the demo because the admin experience is an afterthought. The buyer asks: Can I restrict data sources? Can I turn off actions? Can I see what it did last week? Can I export logs? Can I set different policies for different teams?
If your answers are “we can add that,” your competitor will win the deal with a worse model and a better control plane.
Table 2: Control-plane checklist for shipping agentic features into real organizations
| Control | What “good” looks like | Where it shows up | Real-world reference |
|---|---|---|---|
| Identity + SSO | SAML/OIDC login, SCIM provisioning, role mapping, per-user connector auth | Admin settings, connector onboarding, audit logs | Okta, Microsoft Entra ID, Google Workspace SSO |
| Policy engine | Rules for tools/actions (allow/deny/approve), scoped by team/project | Admin console + runtime enforcement | AWS IAM-style policies as the mental model |
| Audit + export | Immutable action log, searchable, exportable to SIEM | Security/compliance workflows | Splunk, Microsoft Sentinel (common SIEM destinations) |
| Data boundaries | Source allowlists, per-collection access, retention controls | Indexing pipeline + retrieval layer | Confluence/SharePoint permissions as source-of-truth |
| Emergency controls | Global kill switch, connector kill switch, rollback/compensation paths | Status page + admin console + runtime | Incident patterns from payments/email systems (e.g., “stop the send” controls) |
The admin console is not a checkbox for enterprise sales. It’s where trust is created. And in agentic software, trust is the product.
A product decision you can make this week: ship one irreversible action, correctly
If you want a forcing function, pick a single high-stakes action in your product—something that changes state outside your app—and implement it with proper permissioning, provenance, and a kill switch. Not ten actions. One.
Examples that expose whether your system is serious: send an email to an external recipient, issue a refund, merge a pull request, change a billing plan, delete a record, publish a page. These actions force you to build the control plane you’ve been avoiding.
- Define the action contract: inputs, outputs, side effects, and what “undo” means.
- Make the AI act under an identity you can explain to an auditor.
- Show provenance in the UI: sources, timestamps, connector, and diff.
- Add a policy gate that can block it, require approval, or rate-limit it.
- Wire a kill switch that works even if the UI is down.
Do that once and your roadmap changes. You stop talking about “adding AI” and start building software that can safely operate inside other people’s businesses.
Prediction worth sitting with: by the time “agents” feel normal, the winners won’t be the teams with the flashiest demos. They’ll be the ones whose permissioning and audit exports make security teams say, “Fine. Ship it.” If your product can’t get that reaction, what are you actually building?