Your Product Doesn’t Need More AI Features. It Needs Permissioning, Provenance, and a Kill Switch.

Most teams are shipping “AI features” like they’re UI widgets. That’s backward. In 2026, the product risk isn’t that your model is wrong. It’s that you quietly shipped a new kind of operator into your system—one that takes actions—without giving the business the controls it would demand for any other operator.

If your AI can create tickets, change customer data, run SQL, push code, issue refunds, publish marketing copy, or contact leads, you didn’t add a feature. You hired a junior employee and gave them API keys. And most products still treat that like an “integration.”

The industry already knows where this goes. In March 2023, an engineer at Google described a bug in an internal AI tool that suggested staff could view another employee’s calendar; Google said it fixed the issue. In the same month, OpenAI temporarily disabled ChatGPT’s “Browse” feature after it could be used to retrieve paywalled content, calling it a problem with how the tool displayed content. Those are not “model quality” stories. They’re product control stories.

“Complexity is anything related to the structure of a system that makes it hard to understand and modify.” — John Ousterhout

AI adds complexity because it introduces non-determinism plus delegated action. Your product needs new primitives. Not “prompt templates.” Primitives.

Autonomy is the new surface area (and you’re probably measuring the wrong thing)

Classic product metrics—activation, retention, task completion—don’t capture what matters once a system can act. The new surface area is: what the AI is allowed to do, on whose behalf, using which data, with what trace, and how quickly you can undo it.

That’s why the interesting product work in 2024–2026 happened in the boring places: identity, policy, audit logs, and connectors. Microsoft put Copilot into Microsoft 365, but the enterprise story hinged on Microsoft Purview, tenant controls, and compliance boundaries. Salesforce pushed Einstein Copilot and then leaned hard into “Trust Layer” messaging and admin controls. Atlassian’s Rovo and “Atlassian Intelligence” rolled out inside permissioned work graphs. The pattern is consistent: vendors realized the product isn’t the chat box. It’s governance at the point of action.

Founders still copy the chat box because it demos well. But chat is the least important part of an agentic product. Chat is just the remote control.

server racks and code representing AI systems operating inside enterprise infrastructure — As autonomy increases, the product problem shifts from UI to infrastructure-level controls.

The three primitives that separate “AI toy” from “AI product”

If your AI can take actions, your product needs three things that feel more like security engineering than “product”: permissioning, provenance, and a kill switch. Not as a slide. As first-class UX and API.

1) Permissioning: the AI must act as someone, not as “the app”

Real enterprises already solved this for humans: RBAC, groups, SSO, SCIM, conditional access. Your AI should not bypass those controls by operating under a shared service account. Yet plenty of “agents” still run on a single integration token because it’s easier.

Make the AI assume an identity that maps to an actual user or a tightly-scoped system role. That means:

Per-user authorization to downstream systems (Google Workspace, Microsoft Graph, GitHub, Jira, Salesforce, Slack) rather than a single omnipotent token.
Scoped permissions tied to specific tools/actions (read-only vs write, create vs delete).
Environment boundaries (prod vs sandbox) the AI can’t cross because a prompt asked nicely.
Time-bounded access where the product forces re-auth for sensitive actions.
Approval policies for high-impact actions (refunds, payouts, mass email, data exports).

Tools like Okta and Microsoft Entra ID exist because identity is hard. Stop pretending your agent is special.

2) Provenance: every output needs a supply chain

When an AI drafts a customer response, writes a doc, or updates a record, the business will ask: “Where did that come from?” Not philosophically. Operationally. Which sources, which permissions, which time window, which connector, which model, which tool calls, and what was redacted?

If you can’t answer that inside the product, you’re shipping something that can’t be audited. That blocks serious adoption in regulated industries—and it should.

In 2024, OpenAI, Anthropic, Google, and Microsoft all pushed more structured tool use and enterprise controls. Meanwhile, open-source teams shipped inspection and tracing patterns around LLM calls. The direction is obvious: provenance becomes a standard expectation the way “version history” became expected in docs.

3) Kill switch: reversibility is a feature, not an incident response plan

Every system that can act at scale needs fast shutdown and rollback. That’s true for payments, email, deployments, and data pipelines. Agentic features are in the same class. Your product should support:

Global disable of agent actions (not just the UI) with immediate effect.
Connector-level disable (turn off Salesforce writes but keep reads; disable GitHub merges but keep PR comments).
Per-policy disable (block “external email” actions during an incident).
Action rollback where possible (or at least compensating actions).
Human checkpointing for irreversible actions.

“We’ll monitor it” is not a control. It’s a hope.

Stop picking a model. Start picking an execution model.

Most product debates still start with “Which LLM should we use?” That’s procurement. The product decision is your execution model: where reasoning happens, where data is retrieved, where actions run, and where you log what happened.

Here’s a useful way to compare the main approaches teams actually ship with in 2026. Notice how little of this is about “prompting.”

Table 1: Comparison of common AI execution patterns in shipped products

Approach	Best for	Strength	Sharp edge
Chat + retrieval (RAG) inside your app	Q&A over docs, support deflection, internal search	Fast to ship; mostly read-only	Weak audit story if sources/permissions aren’t enforced; “answer drift” over time
Tool-using assistant (function calling) with bounded actions	Create/update workflow objects: tickets, tasks, CRM records	Deterministic action surface; easier policy enforcement	Temptation to over-scope tools; failures look like product bugs
Autonomous agent loop (plan/act/reflect)	Long-running tasks across systems (ops runbooks, research, multi-step changes)	Handles messy tasks without hand-built flows	Hard to bound; needs strong kill switch, budgets, and traceability
Human-in-the-loop agent (approvals + drafts)	Regulated domains; high-impact comms; finance and HR	High safety; clear accountability	Slower; users may route around it if UX is heavy
Enterprise suite copilot (e.g., Microsoft Copilot, Google Gemini for Workspace)	Cross-app productivity inside one vendor’s stack	Native permissions and admin controls (best-in-class in-suite)	Limited visibility/control for third-party SaaS; hard to differentiate if you’re not the platform

The contrarian move: pick the most boring execution model that still delivers the user outcome. If your product can win with bounded tools and approvals, don’t race to “fully autonomous.” Autonomy is not a virtue. It’s a liability you accept because it buys something specific.

team reviewing product flows and permissions on a whiteboard — Agentic products live or die on policy design and review flows, not on UI polish.

The connector tax is the real AI tax

Every founder wants to talk about models. Buyers want to talk about connectors. Because the value is behind the firewall: Google Drive, SharePoint, Confluence, Jira, ServiceNow, Salesforce, SAP, Snowflake, Databricks, GitHub, Slack, Microsoft Teams. If you can’t connect cleanly—and keep permissions intact—you don’t have an AI product. You have a demo.

This is why “enterprise search” vendors matter again (Glean is the obvious example) and why platform vendors keep tightening their own ecosystems (Microsoft Graph, Google Workspace APIs). It’s also why open-source orchestration (LangChain), structured extraction (Pydantic), and vector stores (Pinecone, Weaviate, Milvus) became table stakes in the first wave: teams were trying to stitch together a data plane quickly. In the second wave, stitching isn’t enough; you need governance that survives audit.

Key Takeaway

If you can’t describe, in one sentence, how permissions propagate from the source system to your AI output and then to an action—your product will stall in security review.

What “permission-preserving” actually means

Teams often claim this and then quietly do something else. Permission-preserving means your retrieval layer and your action layer both respect the same identity context:

Retrieval queries filter by the requesting user’s access (not just “org access”).
Embeddings and indexes don’t become a backdoor for data a user couldn’t read in the source system.
Cached AI outputs are scoped and expire like the underlying data.
Actions in downstream tools run under that same user (or an explicit, constrained service role) with logs.

Designing policy the way SRE designs reliability: budgets, gates, and traces

Agentic products need something like an SRE mindset: define what failure looks like, then design budgets and controls around it. Not because regulators told you to. Because your own system will produce weird edge cases at the worst time.

Budgets: cap blast radius before you need heroics

Budgets are not just about compute cost. They’re about limiting how much the agent can do before a human looks. Think in terms users already understand:

Time budget: how long an agent can run before it must checkpoint.
Action budget: how many writes, emails, or tickets it can create in one run.
Scope budget: which accounts/projects/repos it can touch.
Data budget: which datasets or document collections it can access.

Gates: approvals are not a failure; they’re product-market fit

Founders hate approvals because approvals reduce the “wow.” Buyers love approvals because approvals map to how companies actually work. If your agent can change a Salesforce record, you can put a gate in front of “mass update,” “stage change,” or “close won.” That’s not fear. That’s governance.

The trick is to make approvals low-friction: show the diff, show the sources, show the policy rule that triggered the gate, and make it one click to accept or reject.

Traces: observability for decisions, not just latency

Traditional logs tell you request/response and timing. Agent traces must tell you: what it believed, what it saw, what it tried, and what changed. If you build on OpenTelemetry concepts, great; if you build something custom, fine. The product requirement is consistent: an operator should be able to answer “why did it do that?” without guessing.

# Example: minimal “agent action” log shape (store in your event pipeline)
{
  "timestamp": "2026-05-14T18:22:11Z",
  "actor": {"type": "ai_agent", "agent_id": "support-agent", "run_id": "run_01"},
  "on_behalf_of": {"user_id": "u_123", "workspace_id": "w_456"},
  "inputs": {"ticket_id": "INC-1082"},
  "retrieval": [{"source": "confluence", "doc_id": "KB-77", "permission": "user"}],
  "decision": {"intent": "issue_refund", "confidence": "n/a", "policy": "refunds_require_approval"},
  "action": {"tool": "stripe", "operation": "create_refund", "status": "blocked_pending_approval"},
  "artifacts": {"proposed_change": "refund $X", "diff": "..."}
}

laptop showing monitoring dashboards and logs used to trace automated actions — If the system can act, you need traces that explain decisions—not just uptime charts.

Build the admin product like it’s the product (because it is)

Most AI features fail after the demo because the admin experience is an afterthought. The buyer asks: Can I restrict data sources? Can I turn off actions? Can I see what it did last week? Can I export logs? Can I set different policies for different teams?

If your answers are “we can add that,” your competitor will win the deal with a worse model and a better control plane.

Table 2: Control-plane checklist for shipping agentic features into real organizations

Control	What “good” looks like	Where it shows up	Real-world reference
Identity + SSO	SAML/OIDC login, SCIM provisioning, role mapping, per-user connector auth	Admin settings, connector onboarding, audit logs	Okta, Microsoft Entra ID, Google Workspace SSO
Policy engine	Rules for tools/actions (allow/deny/approve), scoped by team/project	Admin console + runtime enforcement	AWS IAM-style policies as the mental model
Audit + export	Immutable action log, searchable, exportable to SIEM	Security/compliance workflows	Splunk, Microsoft Sentinel (common SIEM destinations)
Data boundaries	Source allowlists, per-collection access, retention controls	Indexing pipeline + retrieval layer	Confluence/SharePoint permissions as source-of-truth
Emergency controls	Global kill switch, connector kill switch, rollback/compensation paths	Status page + admin console + runtime	Incident patterns from payments/email systems (e.g., “stop the send” controls)

The admin console is not a checkbox for enterprise sales. It’s where trust is created. And in agentic software, trust is the product.

administrator reviewing access policies and approvals for automated tools — The “AI admin” role is real now: someone will own policy, approvals, and incident response.

A product decision you can make this week: ship one irreversible action, correctly

If you want a forcing function, pick a single high-stakes action in your product—something that changes state outside your app—and implement it with proper permissioning, provenance, and a kill switch. Not ten actions. One.

Examples that expose whether your system is serious: send an email to an external recipient, issue a refund, merge a pull request, change a billing plan, delete a record, publish a page. These actions force you to build the control plane you’ve been avoiding.

Define the action contract: inputs, outputs, side effects, and what “undo” means.
Make the AI act under an identity you can explain to an auditor.
Show provenance in the UI: sources, timestamps, connector, and diff.
Add a policy gate that can block it, require approval, or rate-limit it.
Wire a kill switch that works even if the UI is down.

Do that once and your roadmap changes. You stop talking about “adding AI” and start building software that can safely operate inside other people’s businesses.

Prediction worth sitting with: by the time “agents” feel normal, the winners won’t be the teams with the flashiest demos. They’ll be the ones whose permissioning and audit exports make security teams say, “Fine. Ship it.” If your product can’t get that reaction, what are you actually building?