The popularity of agents that run from users' devices is going to push sites that don't have logins to add them and sites with logins to add tougher captchas.
I think the underlying assumption in this is an important question to consider. Should we treat agents as we would have treated bots over the decades. I do believe that treating agents like traditional bots of old misses an important aspect. Traditional bots are doing something with the intent to serve some external entities gain (scraping content, attacks, etc.). Agents, while leveraging similar systems, are serving a site's end consumer. When I use an agent to shop, I'm still the customer of the shop. As the shop owner, I want to give the best experience therefore it's in my best interest to provide an AX that supports them providing a good experience to the end user. Because my target customer is now using an agent to help make a purchase, if I shut my door to their delegated system, I'm telling them to shop somewhere else that does support this.
We are early enough in this evolution to help direct the ship in a way that serves the end user, web owners/creators, and the agent.
I think economic incentives are going to get in the way of that, as is tradition. Amazon’s dev teams in charge of the retail web interface might want to make it easier to sell you more products regardless of interface but there’s always a competing VP with more influence that wants to juice their KPIs by stuffing more advertising down the user’s throat, so they drive top down decisions that impede agents.
It’s almost inevitable since everyone wants more growth and advertising is almost always seen as free money left on the table by decision makers.
I agree! That said, they won't turn down the money through affiliate systems and resellers either.
The economic incentives, the brand control needs, etc. are important dynamics and I don't think it's all in their court alone. It's a combination of where the market goes (the platforms and systems they prefer) and the capabilities unlocked by those platforms.
With that, this evolution will follow the propagation of agent usage. So we will see a lot more initial adoption of AX principles and patterns from developer tools because the software industry has be the most infiltrated by the rise of agentic workflows. As that expands, the nature of markets and meeting user needs will drive adoption of AX.
That's my reaction to the GP's comment. Shop owners will not optimize for agent ease of use. They will optimize for convincing agents to make a purchase. This will play out like SEO, with everyone other than the bad actors losing out.
- In this world the information delivered to agents should align with content delivered visibly to the human web. This is essentially how the bulk of SEO overloading is detected. There needs to be a way to validate this and establish trust - completely solvable. These techniques penalize these schemes from the outset. (this is probably not the best forum to go too deep into that)
- We're assuming agents have full buying decisions here. I do not believe we will see that as common place for a long time. Even if we did, the same systems for PCA compliance are in play and the interfaces pushed by both payment gateways and shopping carts protect against duplicate purchase attempts. Those attempting to abuse this fall more into the malicious actor camp.
- phishing and malicious actors are going to do what they have always done. There are some very important security, access control, and compliance measures we should put in place for the most sensitive of actions - as we always have where most existing ones still apply. The agent experience and the ecosystem in general will have to evolve to have verifiable trust patterns. So that when a human delegates to an agent to do something, the human can have confidence and ways to validate interactions.
I'll be the first to admit that I don't have all of the answers here but with agents becoming the new entry point or delegation tool for the next generation of digital users, these are questions we have to answer and solve for. It starts by focusing the industry around the domain of this problem, that is AX. How to do it effectively and what needs to evolve to achieve it... that's where the work is.
> Agents, while leveraging similar systems, are serving a site's end consumer. When I use an agent to shop, I'm still the customer of the shop. As the shop owner, I want to give the best experience therefore it's in my best interest to provide an AX that supports them providing a good experience to the end user.
This is fine until the agent decides to order something the customer did not want. This is inherent to the concept of an agent. Due to the probabilistic nature of LLMs, and the fact that no agent will ever be perfectly able to predict exactly what you want at the time you want, this scenario is inevitable.
As the shop owner, this would result in an increased numbers of returns. You could recommend that the user must approve the purchase, but given that you do not define these agents, there is no way for you to ensure that the user is actually following your advice.
There are ways to ensure that the end user provides authorization. While the shop owner does not control the agent it does control purchase authorization - primitively that could look like requiring a pin/cvv, confirming via text sent code, etc. This concept can recursively assume that an agent can do these things on the user's behalf but this is where limits come in, compliance regulations, etc. It's not in the shop's or the agent's interest to integrate poorly within these flows. That said, this is where we should establish the conventions that we can enforce consistency and compliance as well as validate them. It wouldn't be hard to imagine that an agent must prove they are operating correctly before they can initiate actions such as purchase requests and then the agent's authority is known and can be held accountable for misuse.
I cannot see the difference in the access mechanism between an agent and what we use today for APIs consumption. The agent, whatever it is, is basically a client, P2P node, etc.
Exactly I also believe that UI would get redundant. In fact agents don't even need to make decisions looking at visual like we use web. Imagine your browser being an agent that takes decisions, it knows which get requests to fetch data from and how to make payments too.
Apparently this is how the "automated" solvers work. Would love to find a source describing how all of this works. One website I frequent uses Datadome and their captcha has a timer on it. I'm assuming this is a factor in "human-ness". Are we all going to be tied to our phones solving captchas as fast as possible?
It's more likely that the user will need to ask the agent to solve the CAPTCHA, because right now AI bots are better at solving CAPTCHAs than humans are.
This is why I'm so bullish on OAuth for sites with logins - you get a strong real user identity to tie the agent's behavior back to. This means you have (some) proof that the agent is helping your end users consume more of your site, and you can also revoke access to agents that misbehave.
We might live in a world where veto-ed assistants get VIP access to use the websites impersonating their owners without much second thought as long as you're at least on the paid Flash Max Pro™ plan.
One time I duct taped a cooked sausage to a USB fan and arranged it so the sausage was continually slapping my passive touch two-factor authenticator. Is that the kind of gross you were talking about?
Not to take the bait on this bit of content marketing ("the future of agents is OAuth, says company that sells OAuth solution"), but: I disagree with the premise that agents should basically use the same APIs and auth mechanisms that humans & apps currently use.
I realize there's a strong impulse to not "reinvent the wheel," but what we have currently is unsustainable. Specifically, the fact that every API uses a slightly different REST API and its own unique authentication & authorization workflow. It worked fine for the days when application developers would spend a few weeks on each new integration, but it totally breaks down when you want to be able to orchestrate an agent across many user-defined services.
I think a simple protocol based on JSON and bog-standard public key encryption could allow agents to coordinate and spend credits/money based on human-defined budgets.
Back when REST was a new hot buzzword and people were debating its true meaning, I remember thinking that some of the arguments for HATEOAS only really made sense if your client apps were going to be some kind of AI graph navigators. So I wonder if being particular about HATEOAS makes more sense now?
Good read, thanks for sharing! I'd love for OAuth to be augmented with agent-friendly scopes. Completely agree that it's a standard that doesn't need to be reinvented. But in how things are today, there's two broad areas where OAuth doesn't quite cut it:
1) long tail of websites that don't have APIs, so the only way for an agent to interact with them on the user's behalf is to log in more conventionally, and
2) even if a website has APIs, there may be tasks to be done that are outside the scope of the provided APIs.
author of the post here, yeah this is a really good point. I think we're going to see more people investing in building OAuth compatible apps and more thorough APIs to support agent use cases. but of course, not every site is going to do so, so agents will in many cases just be doing screenscraping effectively. but I think overtime, users will prefer using applications that make it easier and more secure for agents to interact with them.
I was an early engineer at Plaid and I think it's an interesting parallel, financial data aggregators used to use more of a screenscraping model of integration but over the past 5+ years, it's moved almost fully to OAuth integrations. would expect the adoption curve here to be much steeper than that, banks are notoriously slow so would expect tech companies to move even more quickly towards OAuth and APIs for agents.
another dimension of this, is that it's quite easy to block ai agents screenscraping, we're able to identify with almost 100% accuracy open ai's operator, anthropic's computer use api, browswerbase, etc. so some sites might choose to block agents from screenscraping and require the API path.
all of this is still early too, so excited to see how things develop!
If website haven't been able to make even consistent logins and forms for humans to use, what makes you think they will be able to make usable API's for agents to use?
I've tried making a Firefox extension that fills webforms using an LLM and the things website makers come up with the break their own forms for both humans and agents are just insane.
There are probably over a 1000 different ways to ask for someone's address that an agent (and/or human) would struggle to understand. Just to name an example.
I think agents will be able to get through them easily, but NOT because the websites makers are going to do a better job at being easier to use.
The user agent is pretty low hanging fruit, but these days even your most standard captchas / bot detection algorithms are looking at things like mouse movement patterns - a simple bot controlling a mouse might be coded to move the cursor from wherever it is to the destination in the shortest path possible; a human might try for the shortest path, but actually do something that only approximates the most direct path based on their dexterity, where the cursor began, the mouse they’re using, etc.
Tools in this space rely a lot on human use of a computer being much slower, less precise, and more variable than machine use of a computer.
we're looking at signals from the network, device, and browser as well as patterns across requests to identify these agents. in some cases, like operator today, it's quite trivial to identify based on the user agent but that's quite easy to mask if they wanted to.
behavioral data like mouse movements, shortest path, etc is helpful but likely to result in less of a deterministic signal compared to device intelligence based on those signals of where and how the request is being made.
we'll have a more in depth blog post on what we're seeing with this next week too.
I also think OAuth could be used to better serve AX in the age of agent, but before the whole industry find the PMF, shall we not leave the humans (us) behind? Thus I made one for breaking the grip of big IdPs and offer a more secure and easier authentication solutions for humans [1].
You can find its dogfooding demo on the Show HN [2].
How does computer use APIs affect this? Isn't the whole idea that a UI that a human can use should also be usable by an agent without a lot of special accommodation made? For high volume automation and API is a lot more efficient, but for lots of typical automation (automating what would take a human 20-30 minutes of work to do themselves) this doesn't matter too much.
I think OAuth can complement computer use. Imagine if an agent went through an OAuth flow to get an access token, and was able to use that access token to interact with the same UI that a human interacted with. You'd get a few benefits:
- The human wouldn't need to share their password information with the agent
- Services would be able to block or ask for approval when agents take sensitive actions. Maybe an e-commerce site is happy to let an agent browse and add items to a cart, but wants a human in the loop for checkout.
- Services would be able to attribute any actions taken to the agent on behalf of the user. Did Joe approve this expense report, or did Joe's agent approve this expense report?
hm. API stands for Application Programming Interface.
Which IMO is not same as Application Agentic Interface.. similar to how it is not Application's Human Interface. Maybe closer than that.
But, parsing documentation? And, believing it blindly? hah. Maybe ressurect Semantic web as well..
Yeah interestingly API's in their current form are rarely very good for agents. In many cases tools like Operator using a virtual browser and screenshotting are better for agent interactions than API specs.
This shows we need to build better approaches to agent interactions that are not at the level of "run a virtual browser", but that encodes much more of the workflows available than raw API's do today.
for anything more complex than single throw-this-data-there, probably a wizzard-like workflow would be better. The client initiates it but then the server leads it instead of being 100% passive, e.g. "enter (date|name)" >Then> "enter (amount & currency)" >Then> whatever-else. i am not sure if any such thing exists as protocol; usual REST APIs are just an alphabet with client-driven alphabet-punching that can be combinatorially applied without any order ; the server may very well know the correct order but cannot elegantly enforce it.
AI agents do not have agency. This is just another sloppy and disturbing way that AI people show their disrespect or incompetence about the nature of humans.
If you think AI has agency then you must think all software has agency. AI is just software.
To those of you who say humans are just software: try deactivating a human and see what happens. Note that this is a different experience than deactivating AI.
The popularity of agents that run from users' devices is going to push sites that don't have logins to add them and sites with logins to add tougher captchas.
I think the underlying assumption in this is an important question to consider. Should we treat agents as we would have treated bots over the decades. I do believe that treating agents like traditional bots of old misses an important aspect. Traditional bots are doing something with the intent to serve some external entities gain (scraping content, attacks, etc.). Agents, while leveraging similar systems, are serving a site's end consumer. When I use an agent to shop, I'm still the customer of the shop. As the shop owner, I want to give the best experience therefore it's in my best interest to provide an AX that supports them providing a good experience to the end user. Because my target customer is now using an agent to help make a purchase, if I shut my door to their delegated system, I'm telling them to shop somewhere else that does support this.
We are early enough in this evolution to help direct the ship in a way that serves the end user, web owners/creators, and the agent.
I think economic incentives are going to get in the way of that, as is tradition. Amazon’s dev teams in charge of the retail web interface might want to make it easier to sell you more products regardless of interface but there’s always a competing VP with more influence that wants to juice their KPIs by stuffing more advertising down the user’s throat, so they drive top down decisions that impede agents.
It’s almost inevitable since everyone wants more growth and advertising is almost always seen as free money left on the table by decision makers.
I agree! That said, they won't turn down the money through affiliate systems and resellers either.
The economic incentives, the brand control needs, etc. are important dynamics and I don't think it's all in their court alone. It's a combination of where the market goes (the platforms and systems they prefer) and the capabilities unlocked by those platforms.
With that, this evolution will follow the propagation of agent usage. So we will see a lot more initial adoption of AX principles and patterns from developer tools because the software industry has be the most infiltrated by the rise of agentic workflows. As that expands, the nature of markets and meeting user needs will drive adoption of AX.
Yes, but competing with that -- imagine how much easier it would be to phish an agent into buying a product on the user's behalf.
That's my reaction to the GP's comment. Shop owners will not optimize for agent ease of use. They will optimize for convincing agents to make a purchase. This will play out like SEO, with everyone other than the bad actors losing out.
There are a few layers to this worth considering.
- In this world the information delivered to agents should align with content delivered visibly to the human web. This is essentially how the bulk of SEO overloading is detected. There needs to be a way to validate this and establish trust - completely solvable. These techniques penalize these schemes from the outset. (this is probably not the best forum to go too deep into that)
- We're assuming agents have full buying decisions here. I do not believe we will see that as common place for a long time. Even if we did, the same systems for PCA compliance are in play and the interfaces pushed by both payment gateways and shopping carts protect against duplicate purchase attempts. Those attempting to abuse this fall more into the malicious actor camp.
- phishing and malicious actors are going to do what they have always done. There are some very important security, access control, and compliance measures we should put in place for the most sensitive of actions - as we always have where most existing ones still apply. The agent experience and the ecosystem in general will have to evolve to have verifiable trust patterns. So that when a human delegates to an agent to do something, the human can have confidence and ways to validate interactions.
I'll be the first to admit that I don't have all of the answers here but with agents becoming the new entry point or delegation tool for the next generation of digital users, these are questions we have to answer and solve for. It starts by focusing the industry around the domain of this problem, that is AX. How to do it effectively and what needs to evolve to achieve it... that's where the work is.
> Agents, while leveraging similar systems, are serving a site's end consumer. When I use an agent to shop, I'm still the customer of the shop. As the shop owner, I want to give the best experience therefore it's in my best interest to provide an AX that supports them providing a good experience to the end user.
This is fine until the agent decides to order something the customer did not want. This is inherent to the concept of an agent. Due to the probabilistic nature of LLMs, and the fact that no agent will ever be perfectly able to predict exactly what you want at the time you want, this scenario is inevitable.
As the shop owner, this would result in an increased numbers of returns. You could recommend that the user must approve the purchase, but given that you do not define these agents, there is no way for you to ensure that the user is actually following your advice.
There are ways to ensure that the end user provides authorization. While the shop owner does not control the agent it does control purchase authorization - primitively that could look like requiring a pin/cvv, confirming via text sent code, etc. This concept can recursively assume that an agent can do these things on the user's behalf but this is where limits come in, compliance regulations, etc. It's not in the shop's or the agent's interest to integrate poorly within these flows. That said, this is where we should establish the conventions that we can enforce consistency and compliance as well as validate them. It wouldn't be hard to imagine that an agent must prove they are operating correctly before they can initiate actions such as purchase requests and then the agent's authority is known and can be held accountable for misuse.
There are no websites that I visit now that don't have a login that I would still visit if they suddenly started putting up captchas
I cannot see the difference in the access mechanism between an agent and what we use today for APIs consumption. The agent, whatever it is, is basically a client, P2P node, etc.
Exactly I also believe that UI would get redundant. In fact agents don't even need to make decisions looking at visual like we use web. Imagine your browser being an agent that takes decisions, it knows which get requests to fetch data from and how to make payments too.
Wouldn't the agent just send a notification to the user's phone and say "can you solve this please?"
Apparently this is how the "automated" solvers work. Would love to find a source describing how all of this works. One website I frequent uses Datadome and their captcha has a timer on it. I'm assuming this is a factor in "human-ness". Are we all going to be tied to our phones solving captchas as fast as possible?
It's more likely that the user will need to ask the agent to solve the CAPTCHA, because right now AI bots are better at solving CAPTCHAs than humans are.
This is why I'm so bullish on OAuth for sites with logins - you get a strong real user identity to tie the agent's behavior back to. This means you have (some) proof that the agent is helping your end users consume more of your site, and you can also revoke access to agents that misbehave.
We might live in a world where veto-ed assistants get VIP access to use the websites impersonating their owners without much second thought as long as you're at least on the paid Flash Max Pro™ plan.
Ya, webauthn with hardware requirement would kill it too. Gotta physically touch it. It’ll be gross when someone starts to automate that too.
One time I duct taped a cooked sausage to a USB fan and arranged it so the sausage was continually slapping my passive touch two-factor authenticator. Is that the kind of gross you were talking about?
https://www.vice.com/en/article/this-piece-of-meat-just-swip...
It would also kill it for 99% of humans.
My entire extended family has two yubikeys: My key and my spare key.
Captcha solvers are already quite cheap. AI could make it cheaper, but for a single user, I don't think it would make a difference.
Not to take the bait on this bit of content marketing ("the future of agents is OAuth, says company that sells OAuth solution"), but: I disagree with the premise that agents should basically use the same APIs and auth mechanisms that humans & apps currently use.
I realize there's a strong impulse to not "reinvent the wheel," but what we have currently is unsustainable. Specifically, the fact that every API uses a slightly different REST API and its own unique authentication & authorization workflow. It worked fine for the days when application developers would spend a few weeks on each new integration, but it totally breaks down when you want to be able to orchestrate an agent across many user-defined services.
I think a simple protocol based on JSON and bog-standard public key encryption could allow agents to coordinate and spend credits/money based on human-defined budgets.
We're finally putting the 'agent' in 'user agent'
And the agent actually works for a large corporation with zero fiduciary duty to the user.
Legit chuckle from me!
Back when REST was a new hot buzzword and people were debating its true meaning, I remember thinking that some of the arguments for HATEOAS only really made sense if your client apps were going to be some kind of AI graph navigators. So I wonder if being particular about HATEOAS makes more sense now?
Good read, thanks for sharing! I'd love for OAuth to be augmented with agent-friendly scopes. Completely agree that it's a standard that doesn't need to be reinvented. But in how things are today, there's two broad areas where OAuth doesn't quite cut it:
1) long tail of websites that don't have APIs, so the only way for an agent to interact with them on the user's behalf is to log in more conventionally, and
2) even if a website has APIs, there may be tasks to be done that are outside the scope of the provided APIs.
Thoughts?
author of the post here, yeah this is a really good point. I think we're going to see more people investing in building OAuth compatible apps and more thorough APIs to support agent use cases. but of course, not every site is going to do so, so agents will in many cases just be doing screenscraping effectively. but I think overtime, users will prefer using applications that make it easier and more secure for agents to interact with them.
I was an early engineer at Plaid and I think it's an interesting parallel, financial data aggregators used to use more of a screenscraping model of integration but over the past 5+ years, it's moved almost fully to OAuth integrations. would expect the adoption curve here to be much steeper than that, banks are notoriously slow so would expect tech companies to move even more quickly towards OAuth and APIs for agents.
another dimension of this, is that it's quite easy to block ai agents screenscraping, we're able to identify with almost 100% accuracy open ai's operator, anthropic's computer use api, browswerbase, etc. so some sites might choose to block agents from screenscraping and require the API path.
all of this is still early too, so excited to see how things develop!
If website haven't been able to make even consistent logins and forms for humans to use, what makes you think they will be able to make usable API's for agents to use?
I've tried making a Firefox extension that fills webforms using an LLM and the things website makers come up with the break their own forms for both humans and agents are just insane.
There are probably over a 1000 different ways to ask for someone's address that an agent (and/or human) would struggle to understand. Just to name an example.
I think agents will be able to get through them easily, but NOT because the websites makers are going to do a better job at being easier to use.
Interesting, what's are the heuristics for blocking? User agent? Something playwright does, metadata like resolution or actual behavior?
The user agent is pretty low hanging fruit, but these days even your most standard captchas / bot detection algorithms are looking at things like mouse movement patterns - a simple bot controlling a mouse might be coded to move the cursor from wherever it is to the destination in the shortest path possible; a human might try for the shortest path, but actually do something that only approximates the most direct path based on their dexterity, where the cursor began, the mouse they’re using, etc.
Tools in this space rely a lot on human use of a computer being much slower, less precise, and more variable than machine use of a computer.
we're looking at signals from the network, device, and browser as well as patterns across requests to identify these agents. in some cases, like operator today, it's quite trivial to identify based on the user agent but that's quite easy to mask if they wanted to.
behavioral data like mouse movements, shortest path, etc is helpful but likely to result in less of a deterministic signal compared to device intelligence based on those signals of where and how the request is being made.
we'll have a more in depth blog post on what we're seeing with this next week too.
I also think OAuth could be used to better serve AX in the age of agent, but before the whole industry find the PMF, shall we not leave the humans (us) behind? Thus I made one for breaking the grip of big IdPs and offer a more secure and easier authentication solutions for humans [1].
You can find its dogfooding demo on the Show HN [2].
[1]: https://sign-poc.js.org
[2]: https://news.ycombinator.com/item?id=42076063
How does computer use APIs affect this? Isn't the whole idea that a UI that a human can use should also be usable by an agent without a lot of special accommodation made? For high volume automation and API is a lot more efficient, but for lots of typical automation (automating what would take a human 20-30 minutes of work to do themselves) this doesn't matter too much.
I think OAuth can complement computer use. Imagine if an agent went through an OAuth flow to get an access token, and was able to use that access token to interact with the same UI that a human interacted with. You'd get a few benefits:
- The human wouldn't need to share their password information with the agent
- Services would be able to block or ask for approval when agents take sensitive actions. Maybe an e-commerce site is happy to let an agent browse and add items to a cart, but wants a human in the loop for checkout.
- Services would be able to attribute any actions taken to the agent on behalf of the user. Did Joe approve this expense report, or did Joe's agent approve this expense report?
hm. API stands for Application Programming Interface. Which IMO is not same as Application Agentic Interface.. similar to how it is not Application's Human Interface. Maybe closer than that.
But, parsing documentation? And, believing it blindly? hah. Maybe ressurect Semantic web as well..
> Maybe ressurect Semantic web as well
This gave me a chuckle. I believe the current hype term along this line is "ontologies".
Yeah interestingly API's in their current form are rarely very good for agents. In many cases tools like Operator using a virtual browser and screenshotting are better for agent interactions than API specs.
This shows we need to build better approaches to agent interactions that are not at the level of "run a virtual browser", but that encodes much more of the workflows available than raw API's do today.
for anything more complex than single throw-this-data-there, probably a wizzard-like workflow would be better. The client initiates it but then the server leads it instead of being 100% passive, e.g. "enter (date|name)" >Then> "enter (amount & currency)" >Then> whatever-else. i am not sure if any such thing exists as protocol; usual REST APIs are just an alphabet with client-driven alphabet-punching that can be combinatorially applied without any order ; the server may very well know the correct order but cannot elegantly enforce it.
AI agents do not have agency. This is just another sloppy and disturbing way that AI people show their disrespect or incompetence about the nature of humans.
If you think AI has agency then you must think all software has agency. AI is just software.
To those of you who say humans are just software: try deactivating a human and see what happens. Note that this is a different experience than deactivating AI.
Thank you for sharing!