Most “AI copilots” are just chatboxes duct-taped onto products that already work. The new wave is different: features that do things—send emails, change settings, open PRs, approve expenses, update records, trigger campaigns. That’s not a UI add-on. That’s product behavior.
And behavior is where products die.
The recurring mistake: teams treat agentic features like an inference problem (“pick a better model”) instead of a product contract problem (“what exactly is allowed, visible, reversible, and billable?”). The model is the least interesting part. The contract is the product.
“We shape our tools and thereafter our tools shape us.” — Marshall McLuhan
McLuhan wasn’t talking about LLMs, but the line fits: the moment your product can act, the product starts shaping user workflows, compliance posture, and organizational risk tolerance. If you don’t design that shape intentionally, users will do it for you—by disabling the feature, banning it, or routing around it with another tool.
Agentic UX is a permissions problem disguised as intelligence
In 2024–2025, the mainstream move was “assistant everywhere”: Microsoft Copilot across Windows and Microsoft 365, Google’s Gemini across Workspace, Salesforce Einstein Copilot inside CRM, Atlassian Intelligence inside Jira and Confluence, Notion AI inside docs and databases, GitHub Copilot inside IDEs. The user asks, the system answers.
In 2026, the pressure is “assistant that acts.” GitHub Copilot added an “agent” mode (announced in 2025) aimed at doing multi-step coding tasks. OpenAI introduced the Assistants API (2023) and later the Responses API (2025) to help developers build systems that call tools, maintain state, and produce structured outputs. Anthropic pushed tool use and computer-use patterns. Frameworks like LangChain and LlamaIndex normalized “agents” as a product building block. None of this is exotic anymore; it’s table stakes.
What’s missing in too many launches is a hard distinction between:
- Suggestive features (draft, recommend, summarize) where the user stays the actor.
- Agentic features (execute, change, send, approve) where the product becomes an actor.
- Autonomous features (run on a schedule or trigger) where the product acts without a live user in the loop.
When you ship agentic behavior but keep suggestive UX patterns (a chatbox, a “sounds good” button, no explicit scoping), you create a trust vacuum. Users can’t tell what will happen, what did happen, or how to undo it. Engineering can’t tell what to log. Security can’t tell what to permit. Finance can’t tell what to bill.
Key Takeaway
As soon as AI can take an action, your product needs an explicit contract: scope, authorization, audit trail, reversibility, and cost visibility. Without that, you’re shipping a liability with a friendly UI.
The new core primitive: “intent → plan → approval → execution → receipt”
Chat-first UX collapses everything into one blob: user types, model responds. That’s fine for writing. It’s reckless for actions.
For action-taking features, the winning shape is a pipeline with named artifacts. You don’t need to over-theorize it; you need to make it legible:
- Intent: what the user wants (in their words).
- Plan: the system’s proposed steps and affected objects.
- Approval: explicit authorization at the right granularity.
- Execution: tool calls, writes, network actions.
- Receipt: a durable, inspectable record of what happened.
This looks like “extra steps.” It’s not. It’s the minimum structure required for trust, debugging, and compliance. When something goes wrong—and it will—receipts make the difference between a fix and a PR crisis.
Receipts are not logs
Engineering logs are for engineers. Receipts are product artifacts for users, admins, and auditors. A receipt answers: What changed? Who approved it? Which data sources were touched? What was the model asked? What tools were called? What was rolled back? You can redact sensitive content, but you can’t omit the fact that it was accessed.
Plans are where you control blast radius
Plans aren’t just “here’s what I’m going to do.” Plans are where you bound the action space. If the user asked “clean up my CRM,” the plan should enumerate objects and counts at the object level (accounts, contacts, opportunities) and show proposed transforms before writing. If the user asked “open a PR,” the plan should list files, tests to run, and the exact branch target.
Pick an “action tier” model and enforce it everywhere
If you don’t define action tiers, your product will default to the worst combination: high power, low clarity.
Action tiers are simple: classify every agentic capability by risk and make the UX and permissioning match. Here’s a practical split that maps cleanly to real product behaviors.
Table 1: Comparison of action tiers for agentic product features
| Action tier | Typical capabilities | Recommended guardrails | Where it fits |
|---|---|---|---|
| Read-only | Search, summarize, answer using existing data | Source citations, data access receipts, admin-controlled connectors | Notion AI Q&A, Google Workspace summaries, internal knowledge search |
| Draft | Generate content or code without changing system state | Diff view, lint/test suggestions, clear “not executed” labeling | GitHub Copilot suggestions, doc/email drafting |
| Write-with-approval | Create/update records, open PRs, schedule posts | Plan preview, scoped approval, transactional writes, easy rollback | Jira ticket creation, CRM updates, code PR creation |
| Autonomous | Triggered workflows, background agents, scheduled execution | Hard budgets, rate limits, kill switch, receipts + anomaly alerts | Ops automation, compliance monitoring, routine triage |
| Irreversible / external | Payments, deletions, outbound communications at scale | Two-person rule, time delay, sandbox, mandatory human review | Expense approval, mass email send, destructive admin actions |
Notice what’s not in the table: model names. A stronger model doesn’t fix tier confusion. It only makes it easier to ship dangerous defaults faster.
Tooling choices that matter (and the ones that don’t)
Founders and engineering leads still waste time arguing about which LLM is “best.” That’s a procurement mindset. Product outcomes hinge on tool orchestration, isolation, and observability.
Use structured outputs as the default, not a nice-to-have
If your agent is going to call tools, you want structured outputs—JSON schemas, function calling, typed arguments. OpenAI and Anthropic both support tool/function calling patterns; so do many open-weight models via wrappers. The point isn’t vendor allegiance. The point is making the “plan” and “receipt” machine-readable.
A minimal example: require every action proposal to emit a typed plan with explicit objects, permissions, and rollback strategy. Treat malformed outputs as failures, not “best effort.”
{
"intent": "Close stale Jira tickets older than 90 days with no activity",
"plan": [
{"tool": "jira.search", "args": {"jql": "status = Open AND updated < -90d"}},
{"tool": "jira.transition", "args": {"issueKeys": "<from_search>", "toStatus": "Done"}}
],
"approval_scope": {
"projectKeys": ["ENG"],
"maxIssues": 20,
"dryRun": true
},
"rollback": {"supported": false, "note": "Status transitions are reversible only via another transition"}
}
This isn’t fancy. It’s enforceable. And enforceable beats clever.
Isolation is a product feature
Most teams focus on prompt injection as a security concept. Users experience it as “the agent did something weird.” Isolation—scoped credentials, least-privilege tool tokens, per-connector permissioning, environment boundaries—prevents weirdness from becoming damage.
If your agent can access Slack, Gmail, GitHub, and Stripe with one omni-token, you’ve built a single point of catastrophic failure. Enterprise buyers won’t tolerate it, and consumers shouldn’t either.
Observability is not optional; it’s your support queue
Agent failures don’t look like normal bugs. They look like ambiguous partial success: two emails sent, one drafted, a record updated incorrectly, a tool call timed out, then the model “explained” it confidently. Your support team will drown unless you have per-step traces tied to user-visible receipts.
The uncomfortable part: pricing and incentives for agents
Chat pricing trained users to think “I pay for access.” Agentic pricing forces a sharper question: “Am I paying for outcomes or for attempts?”
Most products will drift into one of two bad places:
- Opaque consumption billing tied to tokens/credits without mapping to user value. Users resent it because it feels like paying for the model’s internal monologue.
- All-you-can-eat bundles that hide costs until the finance team clamps down with internal bans and procurement friction.
The contrarian stance: agentic features need an explicit budget concept in the product, not just in your cloud bill. Budgets aren’t only about cost. They’re about behavior control.
Budgets should cap risk, not only spend
A budget can be expressed as “max emails per day,” “max records modified per run,” “no external recipients,” “only run during business hours,” “max PRs opened,” “max compute minutes.” These are product constraints users understand. They also map to safety.
Table 2: A practical “agent contract” checklist you can bake into product requirements
| Contract element | User-visible UX | Engineering requirement | Owner |
|---|---|---|---|
| Scope | Plan lists affected objects, connectors, and limits | Typed plan schema + validation; per-tool allowlist | Product + Eng |
| Authorization | Explicit approve step; per-workspace/admin controls | Least-privilege tokens; approval gates; MFA/SSO where applicable | Security + Platform |
| Reversibility | Undo/rollback UI, or clear “cannot be undone” warnings | Transactional writes; soft-delete; versioning; compensating actions | Eng |
| Receipts & audit | Activity feed with steps, timestamps, approver, diffs | Immutable event log; trace IDs; connector access logs | Platform + Support |
| Budgets & rate limits | User/admin-set caps; “paused” state with reason | Per-tenant quotas; anomaly detection; kill switch | Product Ops |
Design patterns that will win in 2026 (and the ones that will age badly)
The industry is about to repeat an old cycle: early power users tolerate rough edges; mainstream users demand predictability; regulators and enterprise buyers demand control. The products that survive are the ones that treat agents as governed actors, not magical interns.
Pattern: “Diff-first” for any write action
Code has diff. Content has track changes. Data products still too often have “Apply” with no preview. If your agent edits anything—tickets, CRM records, configurations—show a diff. Not a prose summary. A diff.
Pattern: “Kill switch” that’s actually reachable
Every agentic feature needs a hard stop that a non-engineer can use. In practice: a prominent pause control, a workspace-level disable, and a way to revoke tokens/connector access without filing a support ticket.
Pattern: “Narrow agents” beat “general agents”
General agents demo well and fail quietly. Narrow agents ship well and fail loudly. Pick narrow: “triage inbound support tickets in Zendesk and draft replies,” “open a PR with a failing test fix,” “reconcile invoices in QuickBooks” (and yes, QuickBooks exists; whether you integrate is your choice). Each narrow agent can have a crisp contract, scoped permissions, and measurable outcomes.
Anti-pattern: chat as the only interface
Chat is a great input modality. It’s a terrible control modality. If the only way to manage an agent is to talk to it, you’ve built a product that can’t be administered. The admin experience needs switches, limits, logs, roles, and exports. No one runs a company on vibes.
A sharp prediction, and a concrete next move
Prediction: by the end of 2026, “AI agent” won’t be a differentiator. “AI agent with a clear contract” will. The buyers who matter—IT, security, ops leaders, and serious prosumers—will standardize on tools that can be governed. The rest will get quarantined as toys.
Next move: take one agentic workflow you’re building (or already shipped) and write its contract on a single page. Not marketing copy. A contract: allowed actions, required approvals, budgets, receipts, rollback story, and kill switch. If you can’t fit it on a page, the feature is too broad—or you haven’t decided what it is.
Then ask the question most teams avoid: if this agent makes the wrong change once, can a user prove what happened and undo it in under five minutes? If the answer is no, you don’t have an agent. You have an incident generator.