Product decks are full of "agents" now. Animated cursors flying across screens. Flows that supposedly "run themselves." A button that says "Let the AI handle it." In the demo, the agent glides through forms, opens the right tools, clicks the right buttons, and hands you a tidy result. In production, it looks more like this: The agent starts doing something you did not ask for.
It clicks the wrong item in a table because the DOM changed.
It half-completes a flow, gets stuck, and leaves your state in a weird in-between.
Nobody can tell what happened without reading logs. Autonomy in the UI is not automatically a win. It is a trade. You are exchanging direct manipulation and predictability for speed and delegation. If you make that trade blindly, you get chaos. The real on the open web question is not "can we add an agent." It is: In this specific interface, what should be delegated, to what extent, and under which constraints? Everything else is decoration. ## WHAT "AGENTS IN THE UI" ACTUALLY ARE Strip away the branding. An "agent" in a product interface is usually: A loop that can: * Observe parts of the UI or system state
- Decide what step to take next toward a goal
- Act by clicking, typing, calling APIs, or editing data)
- Repeat until it thinks the goal is done or it gets stuck That loop can be: * Embedded in the front end, driving the DOM
- Sitting behind the scenes, orchestrating workflow calls
- Split between model decisions and hard-coded scripts From a user's point of view, the key changes compared to classic automation are: * The sequence of steps is no longer fully predetermined
- The system may initiate actions on its own
- The system might change its plan mid way based on what it sees This is where autonomy lives in the UI. And where it can go wrong. The design task is not to make the loop clever. It is to make its behavior: * Legible
- Contained
- Recoverable If you cannot guarantee those three, you probably want a simpler pattern. ## WHERE AUTONOMY HELPS There are narrow, repeatable situations where UI agents genuinely reduce friction. ### Repetitive, multi-step workflows with clear endpoints Think about flows like: * Filing a standard expense report from a receipt
- Moving a support ticket through a known resolution path
- Assembling a weekly report from dashboards and docs
- Running a repetitive configuration or provisioning sequence In each case: * The goal is clear
- The steps are long and boring
- The cost of a small mistake is low if you catch it before final submit Here an agent can: * Extract information from inputs
- Navigate through screens
- Fill forms and toggle switches
- Present a "before you finalize" summary The autonomy adds value because it compresses time without dragging you into more uncertainty than you had before. You still check the final state. The agent does the grunt work). ### State repair and cleanup Systems outcomes-critical infrastructure reliability engineering drift into messy states: * Orphaned records
- Inconsistent tags or statuses
- Duplicate entries
- Old configurations that no one remembers You could hand users a table with filters and say "go clean this up manually." Or you can offer an agent that: * Proposes a batch of fixes
- Groups similar cases
- Applies decisions consistently If you design it well, users: * See the proposed changes
- Approve or adjust in bulk
- Roll back if something looks wrong Autonomy helps here because the alternative is error-prone manual work that no one really wants to do at all. ### Background tasks that do not touch live UX Some work does not need to happen in front of the user: * Periodic enrichment of records
- Recomputation of recommendations
- Reorganization of internal indexes These are still "agents" in a broad sense, but they never click a visible button. They operate on data and infrastructure, not on the front end. Here autonomy can be higher because: * Inputs and outputs are well defined
- Failures affect performance, not someone's live flow
- You can monitor and roll back without user confusion If your "agent in the UI" is really of this kind, keep it out of the UI. Expose the result, not the process. ## WHERE AUTONOMY ONLY ADDS CHAOS There are also situations where putting an agent in the interface is almost guaranteed to make the experience worse. ### Flows with ambiguous or moving goals Any task where the user's goal is fluid is a bad candidate for a highly autonomous agent. Examples: * Exploratory analysis: the user's question changes mid-way
- Creative work: briefs evolve as drafts appear
- Complex troubleshooting: new facts emerge as you test hypotheses In these contexts, a rigid "I'll take it from here" agent fights the way people actually think. It surges ahead on an assumption that is already stale. The right pattern here is: * Keep the agent as a suggestor of next steps, not an executor of the whole plan
- Let the user steer, refine, and stop easily
- Make it trivial to backtrack When you force a fully autonomous loop into an exploratory flow, you get wasted work and a user who feels sidelined by their own tool. ### Multi-actor, high-stake decisions Any flow that encodes real responsibility across multiple people and roles is a dangerous place for UI autonomy. A deeper exploration can be found in our analysis in The New AI Governance Stack: Policies, Audits, and Technical Guardrails. Think of: * Approvals for spend, access, or compliance
- HR decisions with legal implications
- Clinical workflows where multiple clinicians sign off
- Financial moves that cross audit or regulatory boundaries If an agent can route, sign, or finalize on behalf of users without clear, visible checkpoints, you are undermining the accountability structure that your organization depends on. In these cases: * Agents can draft, collect information, and propose routes
- Humans must stay in charge of approvals and final actions
- The UI has to make it obvious who did what, and when Letting an agent blur those lines is not efficiency, it is a governance risk. ### Fragile, partially observable environments Most UI agents today rely on: * Selectors that depend on DOM structure
- Heuristics about labels and positions
- Limited visibility into what is happening on the backend If the environment is: * Heavily customized per tenant
- Subject to frequent redesign
- Full of conditional elements and feature flags then a front-end agent is driving half-blind. When you embed that inside policy why governments care about your gpu cluster loss functions-run curriculum design data mixtures emergent behavior the user's main interface, you get: * Flows that work for some users and silently break for others
- Agents that click the wrong control because a label moved
- Hard-to-reproduce failures because the environment changes under your feet In that world, it is often better to: * Move autonomy behind stable APIs
- Use the front end only to visualize and confirm
- Avoid "agent clicks things like a human rlhf constitutional methods alignment tricks" as your primary path If the environment is unstable, an "agent in the UI" is just another source of flakiness. ## WHEN TO OFFER AUTONOMY, WHEN TO OFFER POWER TOOLS For most products, the better question is: Should this be an agent or a power tool? A power tool: * Executes exactly when the user tells it to
- Does one thing or one tight bundle of things
- Has simple, predictable inputs and outputs An agent: * Holds a goal across multiple steps
- Can react to intermediate results
- Can initiate follow-up actions without another explicit click Use agents in the few places where holding that goal across steps genuinely saves cognitive ai tools that help people think effort, not just clicks. Everywhere else, a well designed power tool is clearer and easier to trust. ## DESIGN PRINCIPLES THAT KEEP AGENTS FROM TAKING OVER If you decide autonomy is worth it in a particular flow, the interface has to enforce discipline. ### Make the agent's scope explicit Spell out: * What this agent can act on
- What it will never touch
- What goal it will pursue Right in the UI, in language like: "This assistant can create draft responses to tickets and set their status to Pending Review. It cannot close tickets or issue refunds." Scope is a contract. Without it, users will either under-use the agent or over-trust it. ### Separate "plan" from "execution" in visible steps An agent should not jump from "I think I know what to do" to "I did it" without exposing the plan. For non-trivial flows: * Show a short, concrete plan first
- Let the user approve, edit, or cancel
- Then execute, with live status The plan does not have to be verbose. It has to be inspectable: "Fetch invoices from last month. Match payments. Flag mismatches for review. Do not send emails automatically." This pattern gives users a mental model to compare against reality. ### Always provide stop, undo, and audit No agent should be a runaway train. At minimum: * A visible stop button while the agent is acting
- A clear set of undoable steps, or at least an easy rollback for the main changes
- An audit trail: who triggered what, what the agent did, in what order, with what results If users cannot stop an agent mid-way, they will refuse to use it on anything important. If they cannot undo, they will limit it to trivial tasks. If they cannot see the audit trail, they will not trust it in front of other stakeholders. ### Keep direct manipulation and autonomy side-by-side Do not replace existing controls entirely with an agent. On the same screen: * Keep the usual buttons, forms, and filters
- Add the agent as a parallel path: "Do this for me" This prevents lock-in. When the agent fails or is not appropriate, users can fall back to familiar interactions without context switching to another part of the product. If you hide direct manipulation behind the agent, people will either avoid the product or build their own workarounds. ### Avoid mode confusion If the agent and the user can both act on the same surface, you get classic mode errors: * "Did I change this, or did the agent?"
- "Is it still running or is it waiting for me?"
- "Are we in automatic mode or manual?" You counter that by: * Clear visual models patterns tropes and backlash states: idle, planning, executing, waiting for input
- Subtle but persistent indicators when the agent owns the next move
- Text that says exactly what is happening: "Assistant is filling in this form based on your last upload." Silence is the worst mode. ## THINK IN TERMS OF RESPONSIBILITY, NOT MAGIC Underneath all of this is a blunt question: If the agent does something harmful or wrong, who will be held responsible? If the only honest answer is "the user, because they clicked the button," you have designed a responsibility offload, not a helper. A healthier split looks like this: * The system is responsible for staying within clearly stated bounds, logging its actions, and exposing uncertainty
- The user is responsible for choosing goals, approving high-impact actions, and correcting mistakes The UI should embody that split in: * What the agent is allowed to do on its own
- When it must ask
- How it shows its work The more your interface hides autonomy behind a glossy "let the AI handle it," the more you are pushing users into taking blame for behavior they never had a chance to understand. Agents in the UI are not automatically progress. They are just another way to move work around. When you use them surgically, on the right problems, with tight constraints, they feel like leverage. When you use them everywhere, on vague problems, without discipline, they feel like noise that clicks faster than you can think.



