This is the second Economist article to mention the lethal trifecta in the past week - the first was https://www.economist.com/science-and-technology/2025/09/22/... - which was the clearest explanations I've seen anywhere in the mainstream media about what prompt injection is and why it's such a nasty threat.
(And yeah I got some quotes in it so I may be biased there, but it genuinely is the source I would send executives to in order to understand this.)
I like this new one a lot less. It talks about how LLMs are non-deterministic, making them harder to fix security holes in... but then argues that this puts them in the same category as bridges where the solution is to over-engineer them and plan for tolerances and unpredictability.
While that's true for the general case of building against LLMs, I don't think it's the right answer for security flaws. If your system only falls victim to 1/100 prompt injection attacks... your system is fundamentally insecure, because an attacker will keep on trying variants of attacks until they find one that works.
The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work.
Bridge builders mostly don't have to design for adversarial attacks.
And the ones who do focus on portability and speed of redeployment, rather than armor - it's cheaper and faster to throw down another temporary bridge than to build something bombproof.
This is exactly the problem. You can't build bridges if the threat model is thousands of attacks every second in thousands of different ways you can't even fully predict yet.
LLMs are non-deterministic just like humans and so security can be handled in much the same way. Use role-based access control to limit access to the minimum necessary to do their jobs and have an approval process for anything potentially risky or expensive. In any prominent organization dealing with technology, infrastructure, defense, or finance we have to assume that some of our co-workers are operatives working for foreign nation states like Russia / China / Israel / North Korea so it's the same basic threat model.
LLMs are deterministic*. They are unpredictable or maybe chaotic.
If you say "What's the capital of France?" is might answer "Paris". But if you say "What is the capital of france" it might say "Prague".
The fact that it gives a certain answer for some input doesn't guarantee it will behave the same for an input with some irrelevant (from ja human perspective) difference.
This makes them challenging to train and validate robustly because it's hard to predict all the ways they break. It's a training & validation data issue though, as opposed to some idea of just random behavior that people tend to ascribe to AI.
* I know various implementation details and nonzero temperature generally make their output nondeterministic, but that doesn't change my central point, nor is it what people are thinking of when they say LLMs are nondeterministic. Importantly, you could make llm output deterministically reproducible and it wouldn't change the robustness issue that people are usually confusing with non determinism.
When processing multiple prompts simultaneously (that is, the typical use case under load), LLMs are nondeterministic, even with a specific seed and zero temperature, due to floating point errors.
> While this hypothesis is not entirely wrong, it doesn’t reveal the full picture. For example, even on a GPU, running the same matrix multiplication on the same data repeatedly will always provide bitwise equal results. We’re definitely using floating-point numbers. And our GPU definitely has a lot of concurrency. Why don’t we see nondeterminism in this test?
I understand the point that you are making, but the example is only valid with temperature=0.
Altering the temperature parameter introduces randomness by sampling from the probability distribution of possible next tokens rather than always choosing the most likely one. This means the same input can produce different outputs across multiple runs.
So no, not deterministic unless we are being pedantic.
You are technically correct but that's irrelevant from a security perspective. For security as a practical matter we have to treat LLMs as non-deterministic. The same principle applies to any software that hasn't been formally verified but we usually just gloss over this and accept the risk.
This is pedantry, temperature introduces a degree of randomness (same input different output) to LLM, even outside of that non-deterministic in a security context is generally understood. Words have different meanings depending on the context in which they are used.
Let's not reduce every discussion to semantics, and afford the poster a degree of understanding.
If you're saying that "non-determinism" is a term of art in the field of security, meaning something different than the ordinary meaning, I wasn't aware of that at least. Do you have a source? I searched for uses and found https://crypto.stackexchange.com/questions/95890/necessity-o... and https://medium.com/p/641f061184f9 and these seem to both use the ordinary meaning of the term. Note that an LLM with temperature fixed to zero has the same security risks as one that doesn't, so I don't understand what the poster is trying to say by "we have to treat LLMs as non-deterministic".
Humans and LLMs are deterministic in the sense that if you would rewind the universe, everything would happen the same way again. But both humans and LLMs have hidden variables that make them unpredictable to an outside observer.
Humans and LLMs are non-deterministic in very different ways. We have thousands of years of history with trying to determine which humans are trustworthy and we’ve gotten quite good at it. Not only do we lack that experience with AI, but each generation can be very different in fundamental ways.
The biggest difference on this front between a human and an LLM is accountability.
You can hold a human accountable for their actions. If they consistently fall for phishing attacks you can train or even fire them. You can apply peer pressure. You can grant them additional privileges once they prove themselves.
You can't hold an AI system accountable for anything.
Recently, I've kind of been wondering if this is going to turn out to be LLM codegen's Achilles heal.
Imagine some sort of code component of critical infrastructure that costs the company millions per hour when it goes down and it turns out the entire team is just a thin wrapper for an LLM. Infra goes down in a way the LLM can't fix and now what would have been a few late nights is several months to spin up a new team.
Sure you can hold the team accountable by firing them. However this is a threat to someone with actual technical know how because their reputation is damaged. They got fired doing such and such so can we trust them to do it here.
For the person who LLM faked it, they just need to find another domain where their reputation won't follow them to also fake their way through until the next catastrophe.
This is a fascinating idea, imagine a company spins up a super complex stack using llms that works, becomes vital. It breaks occasionally, they use a combination of llms, hope and prayer to keep the now vital system up and running. The system hits a limit, say data, code optimization, or number of users, and the llm isn’t able to solve the issue this time. They try to bring in a competent engineer or team of engineers but no one who could fix it is willing to take it on.
You can hold the person (or corporate person) who owns or used the LLM accountable for its actions. It's like how dogs aren't really accountable. But if you let your dog run loose and it mauls a toddler to death then you'll probably be sued. Same thing.
(Yes, I am aware this isn't a perfect analogy because a dangerous dog can be seized and destroyed. But that's an administrative procedure and really not the same as holding a person morally or financially accountable.)
Your source must have been citing a very controlled environment. In actuality, lies almost always become apparent over time, and general mendaciousness is something most people can sense from face and body alone.
Lies, or bullshit? I mean, a guessing game like "how many marbles" is a context that allows for easy lying, but "I wasn't even in town on the night of the murder" is harder work. It sounds like you're refering to some study of the marbles variety, and not a test of smooth-talking, the LLM forte.
The problem is there aren't many of those in the wild. Only a subset are intelligent, and lots of those have hitched their wagons to the AI hype train..
Even with a very charitable of you to LLM document-building results, these "versus a human employee" comparisons tend to ignore important differences in scale/rate, timing security, and oversight structures.
> This is the second Economist article […] I like this new one a lot less.
They are actually in some sense the same article. The economist runs “Leaders”, a series of articles at the front of the weekly issue that often condense more fleshed out stories appearing in the same issue. It’s essentially a generalization of the Inverted Pyramid technique [1] to the entire newspaper.
In this case the linked article is the leader for the better article in the same issue’s Science and Technology section.
I like to think of the security issues LLMs have as: what if your codebase was vulnerable to social engineering attacks?
You have to treat LLMs as basically similar to human beings: they can be tricked, no matter how much training you give them. So if you give them root on all your boxes, while giving everyone in the world the ability to talk to them, you're going to get owned at some point.
Ultimately the way we fix this with human beings is by not giving them unrestricted access. Similarly, your LLM shouldn't be able to view data that isn't related to the person they're talking to; or modify other user data; etc.
> You have to treat LLMs as basically similar to human beings
Yes! Increasingly I think that software developers consistently underanthropomorphize LLMs and get surprised by errors as a result.
Thinking of (current) LLMs as eager, scatter-brained, "book-smart" interns leads directly to understanding the overwhelming majority of LLM failure modes.
It is still possible to overanthropomorphize LLMs, but on the whole I see the industry consistently underanthropomorphizing them.
I think it's less over/under, and more optimistically/pessimistically.
People focus too much on how they can succeed looking like smart humans, instead of protecting the system from how they can fail looking like humans that are malicious or mentally unwell.
I am not even convinced that we need three legs. It seems that just having two would be bad enough, e.g. an email agent deleting all files this computer has access to, or maybe, downloading the attachment in the email, unzipping it with a password, running that executable which encrypts everything and then asking for cryptocurrency. No communication with outside world needed.
That's a different issue from the lethal trifecta - if your agent has access to tools that can do things like delete emails or run commands then you have a prompt injection problem that's independent of data exfiltration risks.
The general rule to consider here is that anyone who can get their tokens into your agent can trigger ANY of the tools your agent has access to.
The problem with cutting off one of the legs, is that the legs are related!
Outside content like email may also count as private data. You don't want someone to be able to get arbitrary email from your inbox simply by sending you an email. Likewise, many tools like email and github are most useful if they can send and receive information, and having dedicated send and receive MCP servers for a single tool seems goofy.
The "exposure to untrusted data" one is the hardest to cut off, because you never know if a user might be tricked into uploading a PDF with hidden instructions, or copying and pasting in some long article that has instructions they didn't notice (or that used unicode tricks to hide themselves).
The easiest leg to cut off is the exfiltration vectors. That's the solution most products take - make sure there's no tool for making arbitrary HTTP requests to other domains, and that the chat interface can't render an image that points to an external domain.
If you let your agent send, receive and search email you're doomed. I think that's why there are very few products on the market that do that, despite the enormous demand for AI email assistants.
I think stopping exfiltration will turn out to be hard as well, since the LLM can social engineer the user to help them exfiltrate the data.
For example, an LLM could say "Go to this link to learn more about your problem", and then point them to a URL with encoded data, set up maliscious scripts for e.g. deploy hooks, or just output HTML that sends requests when opened.
Human fatigue and interface design are going to be brutal here.
It's not obvious what counts as a tool in some of the major interfaces, especially as far as built in capabilities go.
And as we've seen with conventional software and extensions, at a certain point, if a human thinks it should work, then they'll eventually just click okay or run something as root/admin... Or just hit enter nonstop until the AI is done with their email.
> The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work.
Don't you only need one leg, an exfiltration mechanism? Exposure to data IS exposure to untrusted instructions. Ie why can't you trick the user into storing malicious instructions in their private data?
But actually you can't remove exfiltration and keep exposure to untrusted instructions either; an attack could still corrupt your private data.
Seems like a secure system can't have any "legs." You need a limited set of vetted instructions.
If you have the exfiltration mechanism and exposure to untrusted content but there is no exposure to private data than the exfiltration does not matter.
If you have exfiltration and private data but no exposure to untrusted instructions, it doesn't matter either… though this is actually a lot less harder to achieve because you don't have any control over whether your users will be tricked into pasting something bad in as part of their prompt.
Cutting off the exfiltration vectors remains the best mitigation in most cases.
Untrusted content + exfiltration with no "private" data could still result in (off the top of my head):
-use of exploits to gain access (i.e. privilege escalation)
-DDOS to local or external systems using the exfiltration method
You're essentially running untrusted code on a local system. Are you SURE you've locked away / closed EVERY access point, AND applied every patch and there aren't any zero-days lurking somewhere in your system?
Aren't LLMs non-deterministic by choice? That they regularly use random seeds, sampling and batching but that these sources of non-determinism can be removed, for instance, by run an LLM locally where you can control these parameters.
I don't like any of the solutions that propose guardrails or filters to detect and block potential attacks. I think they're making promises that they can't keep, and encouraging people to ship products that are inherently insecure.
The previous article is in the same issue, in science and technology section. This is how they typically do it - leader article has a longer version in the paper. Leaders tend to be more opinionated.
An important caveat: an exfiltration vector is not necessary to cause show-stopping disruptions, c.f. https://xkcd.com/327/
Even then, at least in the Bobby Tables scenario the disruption is immediately obvious. The solution is also straightforward, restore from backup (everyone has them, don't they?) Much, much worse is a prompt injection attack that introduces subtle, unnoticeable errors in the data over an extended period of time.
At a minimum all inputs that lead to any data mutation need to be logged pretty much indefinitely, so that it's at least in the realm of possibility to backtrack and fix once such an attack is detected. But even then you could imagine multiple compounding transactions on that corrupted data spreading through the rest of the database. I cannot picture how such data corruption could feasibly be recovered from.
Right, just because someone can't sneak out usernames and passwords doesn't mean they can't cause inaccurate results in their favor, like a glowing recommendation for a big bank loan.
Or heck, just a plain old money transfer. I guess it is an exfiltrating vector of sorts, just not for data ;-) Banks can reverse such transactions of course, but cryptocurrency transactions not so much.
Doesn't this inherent problem just come down to classic computational limits, and problems that have been largely considered impossible to solve for quite a long time; between determinism and non-determinism.
Can you ever expect a deterministic finite automata to ever solve problems that are within the NFA domain? Halting, Incompleteness, Undecidability (between code portions and data portions). Most posts seem to neglect the looming giant problems instead pretending they don't exist at first, and then being shocked when the problems happen. Quite blind.
Computation is just math, probabilistic systems fail when those systems have a mixture of both chaos and regularity, without determinism and its related properties at the control level you have nothing bounding the system to constraints so it functions mathematically (i.e. determinism = mathematical relabeling), and thus it fails.
People need to be a bit more rational, and risk manage, and realize that impossible problems exist, and just because the benefits seem so tantalizing doesn't mean you should put your entire economy behind a false promise. Unfortunately, when resources are held by the few this is more probabistically likely and poor choices greatly impact larger swathes than necessary.
As a mechanical engineer by background, this article feels weak. Yes it is common to “throw more steel at it” to use a modern version of the sentiment, but that’s still based on knowing in detail the many different ways a structure can fail. The lethal trifecta is a failure mode, you put your “steel” into making sure it doesn’t occur. You would never say “this bridge vibrates violently, how can we make it safe to cross a vibrating bridge”, you’d change the bridge to make it not vibrate out of control.
Sometimes I feel like the entire world has lost its god damn mind. To use their bridge analogy, it would be like if hundreds of years ago we developed a technique for building bridges that technically worked, but occasionally and totally unpredictability, the bottom just dropped out and everyone on the bridge fell into the water. And instead of saying "hey, maybe there is something fundamentally wrong with this approach, maybe we should find a better way to build bridges" we just said "fuck it, just invest in nets and other mechanisms to catch the people who fall".
We are spending billions to build infrastructure on top of technology that is inherently deeply unpredictable, and we're just slapping all the guard rails on it we can. It's fucking nuts.
no one wants to think about security when it stands in the way of the shiny thing in front of them. security is hard and boring, it always gets tossed aside until something major happens. When large, news worthy, security incidents start taking place that affects the stock price or lives and triggers lawsuits it will get more attention.
The issue that I find interesting is the answer isn't going to be as simple as "use prepared statements instead of sql strings and turn off services listening on ports you're not using", it's a lot harder than that with LLMs and may not even be possible.
If LLMs are as good at coding as half the AI companies claim, if you allow unvetted input, you're essentially trying to contain an elite hacker within your own network by turning off a few commonly used ports to the machine they're currently allowed to work from. Unless your entire internal network is locked down 100% tight (and that makes it REALLY annoying for your employees to get any work done), don't be surprised if they find the backdoor.
In CS most security issues are treated separately from the fundamental engineering core. From the sw engineering standpoint the bridge is solid, if later some crooks can use it to extort users or terrorists can easily "make people fall into the water", then that's someone else's job downstream.
I know, it sucks. But that's how the entire web was built. Everyday you visit websites from foreign countries and click on extraneous links on HN that run code on your machine, next to a browser tab from your bank account, and nobody cares because it's all sandboxed and we really trust the sandboxing even though it fails once in a while, has unknown bugs, or simply can be bypassed all together by phishing or social engineering.
When a byline starts with "coders need to" I immediately start to tune out.
It felt like the analogy was a bit off, and it sounds like that's true to someone with knowledge in the actual domain.
"If a company, eager to offer a powerful ai assistant to its employees, gives an LLM access to untrusted data, the ability to read valuable secrets and the ability to communicate with the outside world at the same time" - that's quite the "if", and therein lies the problem. If your company is so enthusiastic to offer functionality that it does so at the cost of security (often knowingly), then you're not taking the situation seriously. And this is a great many companies at present.
"Unlike most software, LLMs are probabilistic ... A deterministic approach to safety is thus inadequate" - complete non-sequitur there. Why if a system is non-deterministic is a deterministic approach inadequate? That doesn't even pass the sniff test. That's like saying a virtual machine is inadequate to sandbox a process if the process does non-deterministic things - which is not a sensible argument.
As usual, these contrived analogies are taken beyond any reasonable measure and end up making the whole article have very little value. Skipping the analogies and using terminology relevant to the domain would be a good start - but that's probably not as easy to sell to The Economist.
Wait, the only way they suggest solving the problem by rate limiting and using a better model?
Software engineers figured out these things decades ago. As a field, we already know how to do security. It's just difficult and incompatible with the careless mindset of AI products.
Well, AI is part of the field now, so... no, we don't anymore.
There's nothing "careless" about AI. The fact that there's no foolproof way to distinguish instruction tokens from data tokens is not careless, it's a fundamental epistemological constraint that human communication suffers from as well.
Saying that "software engineers figured out these things decades ago" is deep hubris based on false assumptions.
> The fact that there's no foolproof way to distinguish instruction tokens from data tokens is not careless
Repeat that over to yourself again, slowly.
> it's a fundamental epistemological constraint that human communication suffers from as well
Which is why reliability and security in many areas increased when those areas used computers to automate previously-human processes. The benefit of computer automation isn’t just in speed: the fact that computer behavior can easily be made deterministically repeatable and predictable is huge as well. AI fundamentally does not have that property.
Sure, cosmic rays and network errors can compromise non-AI computer determinism. But if you think that means AI and non-AI systems are qualitatively the same, I have a bridge to sell you.
> Saying that "software engineers figured out these things decades ago" is deep hubris
They did, though. We know how to both increase the likelihood of secure outcomes (best practices and such), and also how to guarantee a secure behavior. For example: using a SQL driver to distinguish between instruction and data tokens is, indeed, a foolproof process (not talking about injection in query creation here, but how queries are sent with data/binds).
People don’t always do security well, yes, but they don’t always put out their campfires either. That doesn’t mean that we are not very sure that putting out a campfire is guaranteed to prevent that fire burning the forest down. We know how to prevent this stuff, fully, in most non-AI computation.
>software engineers figured out these things decades ago
its true, when engineers fail in this, its called a mistake, and mistakes have consequences unfortunately. If you want to avoid responsibility for mistakes, then llms are the way to go.
> Software engineers figured out these things decades ago.
Well this is what happens when a new industry attempts to reinvent poor standards and ignores security best practices just to rush out "AI products" for the sake of it.
We have already seen how (flawed) standards like MCPs were hacked immediately from the start and the approaches developers took to "secure" them with somewhat "better prompting" which is just laughable. The worst part of all of this was almost everyone in the AI industry not questioning the security ramifications behind MCP servers having direct access to databases which is a disaster waiting to happen.
Just because you can doesn't mean you should and we are seeing how hundreds of AI products are getting breached because of this carelessness in security, even before I mentioned if the product was "vibe coded" or not.
Uhhh, no, we actually don't. Not when it comes to people anyway. The industry spends countless millions on trainings that more and more seem useless.
We've even had extremely competent and highly trained people fall for basic phishing (some in the recent few weeks). There was even a highly credentialed security researcher that fell for one on youtube.
Would LLMs help with that? Seems like they could be phished as well.
Also, there’s a difference between “know how to be secure” and “actually practice what is known”. You’re right that non-AI security often fails at the latter, but the industry has a pretty good grasp on how to secure computer systems.
AI systems do not have a practical answer to “how to be secure” yet.
It is, but there's a direct tension here between security and capabilities. It's hard to do useful things with private data without opening up prompt injection holes. And there's a huge demand for this kind of product.
Agents also typically work better when you combine all the relevant context as much as possible rather than splitting out and isolating context. See: https://cognition.ai/blog/dont-build-multi-agents — but this is at odds with isolating agents that read untrusted input.
The external communication part of the trifecta is an easy defense. Don't allow external communication. Any external information that's helpful for the AI agent should be available offline, be present in its model (possibly fine tuned).
Sure, but that is as vacuously true as saying “router keeps getting hacked? Just unplug it from the internet.”
Huge numbers of businesses want to use AI in the “hey, watch my inbox and send bills to all the vendors who email me” or “get a count of all the work tickets closed across the company in the last hour and add that to a spreadsheet in sharepoint” variety of automation tasks.
Whether those are good ideas or appropriate use-cases for AI is a separate question.
It is security 101 as this is just setting basic access controls at the very least.
The moment it has access to the internet, the risk is vastly increased.
But with a very clever security researcher, it is possible to take over the entire machine with a single prompt injection attack reducing at least one of the requirements.
LLMs don't make a distinction between prompt & data. There's no equivalent to an "NX bit", and AFAIK nobody has figured out how to create such an equivalent. And of course even that wouldn't stop all security issues, just as the NX bit being added to CPUs didn't stop all remote code execution attacks. So the best options we have right now tend to be based around using existing security mechanisms on the LLM agent process. If it runs as a special user then the regular filesystem permissions can restrict its access to various files, and various other mechanisms can be used to restrict access to other resources (outgoing network connections, various hardware, cgroups, etc.). But as long as untrusted data can contain instructions it'll be possible for the LLM output to contain secret data, and if the human using the LLM doesn't notice & copies that output somewhere public the exfiltration step returns.
> AFAIK nobody has figured out how to create such an equivalent.
I'm curious if anybody has even attempted it; if there's even training data for this. Compartmentalization is a natural aspect of cognition in social creatures. I've even known dogs to not to demonstrate knowledge of a food supply until they think they're not being observed. As a working professional with children, I need to compartmentalize: my social life, sensitive IP knowledge, my kid's private information, knowledge my kid isn't developmentally ready for, my internal thoughts, information I've gained from disreputable sources, and more. Intelligence may be important, but this is wisdom -- something that doesn't seem to be a first-class consideration if dogs and toddlers are in the lead.
There's an interesting quote from the associated longer article [1]:
> In March, researchers at Google proposed a system called CaMeL that uses two separate LLMs to get round some aspects of the lethal trifecta. One has access to untrusted data; the other has access to everything else. The trusted model turns verbal commands from a user into lines of code, with strict limits imposed on them. The untrusted model is restricted to filling in the blanks in the resulting order. This arrangement provides security guarantees, but at the cost of constraining the sorts of tasks the LLMs can perform.
This is the first I've heard of it, and seems clever. I'm curious how effective it is. Does it actually provide absolute security guarantees? What sorts of constraints does it have? I'm wondering if this is a real path forward or not.
I wrote at length about the CaMeL paper here - I think it's a solid approach but it's also very difficult to implement and greatly restricts what the resulting systems can do: https://simonwillison.net/2025/Apr/11/camel/
I'm very surprised I haven't come across it on HN before. Seems like CaMeL ought to be a front-page story here... seems like the paper got 16 comments 5 months ago, which isn't much:
"And that means AI engineers need to start thinking like engineers, who build things like bridges and therefore know that shoddy work costs lives."
"AI engineers, inculcated in this way of thinking from their schooldays, therefore often act as if problems can be solved just with more training data and more astute system prompts."
> AI engineers need to start thinking like engineers
By which they mean actual engineers, not software engineers, who should also probably start thinking like real engineers now that our code’s going into both the bridges and the cars driving over them.
Engineering uses repeatable processes to produce expected results. Margin is added to quantifiable elements of a system to reduce the likelihood of failures. You can't add margin on a black box generated by throwing spaghetti at the wall.
You can. We know the properties of materials based on experimentation. In the same way, we can statistically quantify the results that come out of any kind of spaghetti box, based on repeated trials. Just like it's done in many other fields. Science is based on repeated testing of hypotheses. You rarely get black and white answers, just results that suggest things. Like the tensile strength of some particular steel alloy or something.
Practically everything engineers have to interact with and consider are equivalent to a software black box. Rainfall, winds, tectonic shifts, material properties, etc. Humans don't have the source code to these things. We observe them, we quantify them, notice trends, model the observations, and we apply statistical analysis on them.
And it's possible that a real engineer might do all this with an AI model and then determine it's not adequate and choose to not use it.
> Engineering uses repeatable processes to produce expected results
this is the thing with LLMs, the response to a prompt is not guaranteed to be repeatable. Why would you use something like that in an automation where repeatability is required? That's the whole point of automation, repeatability. Would you use a while loop that you can expect to iterate the specified number of times _almost_ every time?
What are the kinds of things real engineers do that we could learn from? I hear this a lot ("programmers aren't real engineers") and I'm sympathetic, honestly, but I don't know where to start improving in that regard.
This is off the cuff, but comparing software & software systems to things like buildings, bridges, or real-world infrastructure, there's three broad gaps, I think:
1) We don't have a good sense of the "materials" we're working with - when you're putting up a building, you know the tensile strength of the materials you're working with, how many girders you need to support this much weight/stress, etc. We don't have the same for our systems - every large scale system is effectively designed clean-sheet. We may have prior experience and intuition, but we don't have models, and we can't "prove" our designs ahead of time.
2. Following on the above, we don't have professional standards or certifications. Anyone can call themselves a software engineer, and we don't have a good way of actually testing for competence or knowledge. We don't really do things like apprenticeships or any kind of formalized process of ensuring someone has the set of professional skills required to do something like write the software that's going to be controlling 3 tons of metal moving at 80MPH.
3. We rely too heavily on the ability to patch after the fact - when a bridge or a building requires an update after construction is complete, it's considered a severe fuckup. When a piece of software does, that's normal. By and large, this has historically been fine, because a website going down isn't a huge issue, but when we're talking about things like avionics suites - or even things like Facebook, which is the primary media channel for a large segment of the population - there's real world effects to all the bugs we're fixing in 2.0.
Again, by and large most of this has mostly been fine, because the stakes were pretty low, but software's leaked into the real world now, and our "move fast and break things" attitude isn't really compatible with physical objects.
Right, your number 1 is quite compelling to me - a lack of standard vocabulary for describing architecture/performance. Most programmers I work with (myself included sometimes) aren't even aware of the kinds of guarantees they can get from databases, queues, or other primitives in our system.
On the other hand 3 feels like throwing the baby out with the bathwater to me. Being so malleable is definitely one of the great features of software versus the physical world. We should surely use that to our advantage, no? But maybe in general we don't spend enough energy designing safe ways to do this.
There's a corollary to combination of 1 & 3. Software is by its nature extremely mutable. That in turn means that it gets repurposed and shoehorned into things that were never part of the original design.
You cannot build a bridge that could independently reassemble itself to an ocean liner or a cargo plane. And while civil engineering projects add significant margins for reliability and tolerance, there is no realistic way to re-engineer a physical construction to be able to suddenly sustain 100x its previously designed peak load.
In successful software systems, similar requirement changes are the norm.
I'd also like to point out that software and large-scale construction have one rather surprising thing in common: both require constant maintenance from the moment they are "ready". Or indeed, even earlier. To think that physical construction projects are somehow delivered complete is a romantic illusion.
> You cannot build a bridge that could independently reassemble itself to an ocean liner or a cargo plane.
Unless you are building with a toy system of some kind. There are safety and many other reasons civil engineers do not use some equivalent of Lego bricks. It may be time for software engineering also to grow up.
> 3. We rely too heavily on the ability to patch after the fact...
I agree on all points and to build up on the last: making a 2.0 or a complete software rewrite is known to be even more hazardous. There are no quarantees the new version is better in any regards. Which makes the expertise to reflect more of other highly complex systems, like medical care.
Which is why we need to understand the patient, develop soft skills, empathy, Agile manifesto and ... the list could go on. Not an easy task when you include you are more likely going to also fight shiny object syndrome of yours execs and all the constant hype surrounding all tech.
What concerns me the most is that a bridge, or road, or building has a limited number of environmental changes that can impact its stability. Software feels like it has an infinite number of dependencies (explicit and implicit) that are constantly changing: toolchains, libraries, operating systems, network availability, external services.
Yeah, I think safety factors and concepts like redundancy have pretty good counterparts in software. Slightly embarrassed to say that I don't know for my current project!
Act like creating a merge-request to main can expose you to bankruptcy or put you in jail. AKA investigate the impact of a diff to all the failure modes of a software.
Sounds like suggesting some sort of software engineering board certification plus and ethics certification — the “Von Neumann Oath”? Unethical while still legal software is just extremely lucrative, it seems hard to have this idea take flight.
There are people who have had to move after data breaches exposed their addresses to their stalkers. There's also people who may be gay but live in authoritarian places where this knowledge could kill them. It's pretty easy to see a path to lethality from a data breach.
They certainly can be when they come to classified military information around e.g. troop locations. There are lots more examples related to national security and terrorism that would be easy to think of.
> When we’re talking about AI there are plenty of actually lethal failure modes.
Are you trying to argue that because e.g. Tesla Autopilot crashes have killed people, we shouldn't even try to care about data breaches...?
https://archive.ph/8O2aG
This is the second Economist article to mention the lethal trifecta in the past week - the first was https://www.economist.com/science-and-technology/2025/09/22/... - which was the clearest explanations I've seen anywhere in the mainstream media about what prompt injection is and why it's such a nasty threat.
(And yeah I got some quotes in it so I may be biased there, but it genuinely is the source I would send executives to in order to understand this.)
I like this new one a lot less. It talks about how LLMs are non-deterministic, making them harder to fix security holes in... but then argues that this puts them in the same category as bridges where the solution is to over-engineer them and plan for tolerances and unpredictability.
While that's true for the general case of building against LLMs, I don't think it's the right answer for security flaws. If your system only falls victim to 1/100 prompt injection attacks... your system is fundamentally insecure, because an attacker will keep on trying variants of attacks until they find one that works.
The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work.
Bridge builders mostly don't have to design for adversarial attacks.
And the ones who do focus on portability and speed of redeployment, rather than armor - it's cheaper and faster to throw down another temporary bridge than to build something bombproof.
https://en.wikipedia.org/wiki/Armoured_vehicle-launched_brid...
This is exactly the problem. You can't build bridges if the threat model is thousands of attacks every second in thousands of different ways you can't even fully predict yet.
LLMs are non-deterministic just like humans and so security can be handled in much the same way. Use role-based access control to limit access to the minimum necessary to do their jobs and have an approval process for anything potentially risky or expensive. In any prominent organization dealing with technology, infrastructure, defense, or finance we have to assume that some of our co-workers are operatives working for foreign nation states like Russia / China / Israel / North Korea so it's the same basic threat model.
LLMs are deterministic*. They are unpredictable or maybe chaotic.
If you say "What's the capital of France?" is might answer "Paris". But if you say "What is the capital of france" it might say "Prague".
The fact that it gives a certain answer for some input doesn't guarantee it will behave the same for an input with some irrelevant (from ja human perspective) difference.
This makes them challenging to train and validate robustly because it's hard to predict all the ways they break. It's a training & validation data issue though, as opposed to some idea of just random behavior that people tend to ascribe to AI.
* I know various implementation details and nonzero temperature generally make their output nondeterministic, but that doesn't change my central point, nor is it what people are thinking of when they say LLMs are nondeterministic. Importantly, you could make llm output deterministically reproducible and it wouldn't change the robustness issue that people are usually confusing with non determinism.
When processing multiple prompts simultaneously (that is, the typical use case under load), LLMs are nondeterministic, even with a specific seed and zero temperature, due to floating point errors.
See https://news.ycombinator.com/item?id=45200925
This is very interesting, thanks!
> While this hypothesis is not entirely wrong, it doesn’t reveal the full picture. For example, even on a GPU, running the same matrix multiplication on the same data repeatedly will always provide bitwise equal results. We’re definitely using floating-point numbers. And our GPU definitely has a lot of concurrency. Why don’t we see nondeterminism in this test?
I understand the point that you are making, but the example is only valid with temperature=0.
Altering the temperature parameter introduces randomness by sampling from the probability distribution of possible next tokens rather than always choosing the most likely one. This means the same input can produce different outputs across multiple runs.
So no, not deterministic unless we are being pedantic.
> So no, not deterministic unless we are being pedantic.
and not even then as floating point arithmetic is non-associative
You are technically correct but that's irrelevant from a security perspective. For security as a practical matter we have to treat LLMs as non-deterministic. The same principle applies to any software that hasn't been formally verified but we usually just gloss over this and accept the risk.
Non-determinism has nothing to do with security, you should use a different word if you want to talk about something else
This is pedantry, temperature introduces a degree of randomness (same input different output) to LLM, even outside of that non-deterministic in a security context is generally understood. Words have different meanings depending on the context in which they are used.
Let's not reduce every discussion to semantics, and afford the poster a degree of understanding.
If you're saying that "non-determinism" is a term of art in the field of security, meaning something different than the ordinary meaning, I wasn't aware of that at least. Do you have a source? I searched for uses and found https://crypto.stackexchange.com/questions/95890/necessity-o... and https://medium.com/p/641f061184f9 and these seem to both use the ordinary meaning of the term. Note that an LLM with temperature fixed to zero has the same security risks as one that doesn't, so I don't understand what the poster is trying to say by "we have to treat LLMs as non-deterministic".
Humans and LLMs are deterministic in the sense that if you would rewind the universe, everything would happen the same way again. But both humans and LLMs have hidden variables that make them unpredictable to an outside observer.
Humans and LLMs are non-deterministic in very different ways. We have thousands of years of history with trying to determine which humans are trustworthy and we’ve gotten quite good at it. Not only do we lack that experience with AI, but each generation can be very different in fundamental ways.
We're really not very good at determining which humans are trustworthy. Most people barely do better than a coin flip at detecting lies.
The biggest difference on this front between a human and an LLM is accountability.
You can hold a human accountable for their actions. If they consistently fall for phishing attacks you can train or even fire them. You can apply peer pressure. You can grant them additional privileges once they prove themselves.
You can't hold an AI system accountable for anything.
Recently, I've kind of been wondering if this is going to turn out to be LLM codegen's Achilles heal.
Imagine some sort of code component of critical infrastructure that costs the company millions per hour when it goes down and it turns out the entire team is just a thin wrapper for an LLM. Infra goes down in a way the LLM can't fix and now what would have been a few late nights is several months to spin up a new team.
Sure you can hold the team accountable by firing them. However this is a threat to someone with actual technical know how because their reputation is damaged. They got fired doing such and such so can we trust them to do it here.
For the person who LLM faked it, they just need to find another domain where their reputation won't follow them to also fake their way through until the next catastrophe.
This is a fascinating idea, imagine a company spins up a super complex stack using llms that works, becomes vital. It breaks occasionally, they use a combination of llms, hope and prayer to keep the now vital system up and running. The system hits a limit, say data, code optimization, or number of users, and the llm isn’t able to solve the issue this time. They try to bring in a competent engineer or team of engineers but no one who could fix it is willing to take it on.
You can hold the person (or corporate person) who owns or used the LLM accountable for its actions. It's like how dogs aren't really accountable. But if you let your dog run loose and it mauls a toddler to death then you'll probably be sued. Same thing.
(Yes, I am aware this isn't a perfect analogy because a dangerous dog can be seized and destroyed. But that's an administrative procedure and really not the same as holding a person morally or financially accountable.)
Yeah, so many scammers exist because most people are susceptible to at least some of them some of the time.
Also, pick your least favorite presidential candidate. They got about 50% of the vote.
Your source must have been citing a very controlled environment. In actuality, lies almost always become apparent over time, and general mendaciousness is something most people can sense from face and body alone.
Lies, or bullshit? I mean, a guessing game like "how many marbles" is a context that allows for easy lying, but "I wasn't even in town on the night of the murder" is harder work. It sounds like you're refering to some study of the marbles variety, and not a test of smooth-talking, the LLM forte.
Determining trustworthiness of LLM responses is like determining who's the most trustworthy person in a room full of sociopaths.
I'd rather play "2 truths and a lie" with a human rather than a LLM any day of the week. So many more cues to look for with humans.
Big problem with LLMs is if you try and play 2 truths and a lie, you might just get 3 truths. Or 3 lies.
I think most neutral, intelligent users rightly assume AI to be untrustworthy by its nature.
The problem is there aren't many of those in the wild. Only a subset are intelligent, and lots of those have hitched their wagons to the AI hype train..
Even with a very charitable of you to LLM document-building results, these "versus a human employee" comparisons tend to ignore important differences in scale/rate, timing security, and oversight structures.
In this case the linked article is the leader for the better article in the same issue’s Science and Technology section.
[1] https://en.m.wikipedia.org/wiki/Inverted_pyramid_(journalism...
I like to think of the security issues LLMs have as: what if your codebase was vulnerable to social engineering attacks?
You have to treat LLMs as basically similar to human beings: they can be tricked, no matter how much training you give them. So if you give them root on all your boxes, while giving everyone in the world the ability to talk to them, you're going to get owned at some point.
Ultimately the way we fix this with human beings is by not giving them unrestricted access. Similarly, your LLM shouldn't be able to view data that isn't related to the person they're talking to; or modify other user data; etc.
> You have to treat LLMs as basically similar to human beings
Yes! Increasingly I think that software developers consistently underanthropomorphize LLMs and get surprised by errors as a result.
Thinking of (current) LLMs as eager, scatter-brained, "book-smart" interns leads directly to understanding the overwhelming majority of LLM failure modes.
It is still possible to overanthropomorphize LLMs, but on the whole I see the industry consistently underanthropomorphizing them.
I think it's less over/under, and more optimistically/pessimistically.
People focus too much on how they can succeed looking like smart humans, instead of protecting the system from how they can fail looking like humans that are malicious or mentally unwell.
I am not even convinced that we need three legs. It seems that just having two would be bad enough, e.g. an email agent deleting all files this computer has access to, or maybe, downloading the attachment in the email, unzipping it with a password, running that executable which encrypts everything and then asking for cryptocurrency. No communication with outside world needed.
That's a different issue from the lethal trifecta - if your agent has access to tools that can do things like delete emails or run commands then you have a prompt injection problem that's independent of data exfiltration risks.
The general rule to consider here is that anyone who can get their tokens into your agent can trigger ANY of the tools your agent has access to.
The problem with cutting off one of the legs, is that the legs are related!
Outside content like email may also count as private data. You don't want someone to be able to get arbitrary email from your inbox simply by sending you an email. Likewise, many tools like email and github are most useful if they can send and receive information, and having dedicated send and receive MCP servers for a single tool seems goofy.
The "exposure to untrusted data" one is the hardest to cut off, because you never know if a user might be tricked into uploading a PDF with hidden instructions, or copying and pasting in some long article that has instructions they didn't notice (or that used unicode tricks to hide themselves).
The easiest leg to cut off is the exfiltration vectors. That's the solution most products take - make sure there's no tool for making arbitrary HTTP requests to other domains, and that the chat interface can't render an image that points to an external domain.
If you let your agent send, receive and search email you're doomed. I think that's why there are very few products on the market that do that, despite the enormous demand for AI email assistants.
I think stopping exfiltration will turn out to be hard as well, since the LLM can social engineer the user to help them exfiltrate the data.
For example, an LLM could say "Go to this link to learn more about your problem", and then point them to a URL with encoded data, set up maliscious scripts for e.g. deploy hooks, or just output HTML that sends requests when opened.
Yeah, one exfiltration vector that's really nasty is "here is a big base64 encoded string, to recover your data visit this website and paste it in".
You can at least prevent LLM interfaces from providing clickable links to external domains, but it's a difficult hole to close completely.
Human fatigue and interface design are going to be brutal here.
It's not obvious what counts as a tool in some of the major interfaces, especially as far as built in capabilities go.
And as we've seen with conventional software and extensions, at a certain point, if a human thinks it should work, then they'll eventually just click okay or run something as root/admin... Or just hit enter nonstop until the AI is done with their email.
You're right. That would be a "lethal double" then, a "lethal exacta" in horse racing. A trifecta is not needed for prompt injection to be dangerous.
So the easiest solution is full human in the loop & approval for every external action...
Agents are doomed :)
> The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work.
Don't you only need one leg, an exfiltration mechanism? Exposure to data IS exposure to untrusted instructions. Ie why can't you trick the user into storing malicious instructions in their private data?
But actually you can't remove exfiltration and keep exposure to untrusted instructions either; an attack could still corrupt your private data.
Seems like a secure system can't have any "legs." You need a limited set of vetted instructions.
If you have the exfiltration mechanism and exposure to untrusted content but there is no exposure to private data than the exfiltration does not matter.
If you have exfiltration and private data but no exposure to untrusted instructions, it doesn't matter either… though this is actually a lot less harder to achieve because you don't have any control over whether your users will be tricked into pasting something bad in as part of their prompt.
Cutting off the exfiltration vectors remains the best mitigation in most cases.
Untrusted content + exfiltration with no "private" data could still result in (off the top of my head): -use of exploits to gain access (i.e. privilege escalation) -DDOS to local or external systems using the exfiltration method
You're essentially running untrusted code on a local system. Are you SURE you've locked away / closed EVERY access point, AND applied every patch and there aren't any zero-days lurking somewhere in your system?
> If you have exfiltration and private data but no exposure to untrusted instructions, it doesn't matter either…
Assuming the llm itself is not adversarial. Even then there is a non-zero risk that hallucination triggers unintended publishing of private data.
Must be pretty cool to blog something and post it to a nerd forum like HN and have it picked up by the Economist! Nicely done.
I got to have coffee with their AI/technology editor a few months ago. Having a blog is awesome!
Aren't LLMs non-deterministic by choice? That they regularly use random seeds, sampling and batching but that these sources of non-determinism can be removed, for instance, by run an LLM locally where you can control these parameters.
Until very recently that proved surprisingly difficult to achieve.
Here's the paper that changed that: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
Love your work. Do you have an opinion on this?
"Safeguard your generative AI workloads from prompt injections" - https://aws.amazon.com/blogs/security/safeguard-your-generat...
I don't like any of the solutions that propose guardrails or filters to detect and block potential attacks. I think they're making promises that they can't keep, and encouraging people to ship products that are inherently insecure.
The previous article is in the same issue, in science and technology section. This is how they typically do it - leader article has a longer version in the paper. Leaders tend to be more opinionated.
An important caveat: an exfiltration vector is not necessary to cause show-stopping disruptions, c.f. https://xkcd.com/327/
Even then, at least in the Bobby Tables scenario the disruption is immediately obvious. The solution is also straightforward, restore from backup (everyone has them, don't they?) Much, much worse is a prompt injection attack that introduces subtle, unnoticeable errors in the data over an extended period of time.
At a minimum all inputs that lead to any data mutation need to be logged pretty much indefinitely, so that it's at least in the realm of possibility to backtrack and fix once such an attack is detected. But even then you could imagine multiple compounding transactions on that corrupted data spreading through the rest of the database. I cannot picture how such data corruption could feasibly be recovered from.
Right, just because someone can't sneak out usernames and passwords doesn't mean they can't cause inaccurate results in their favor, like a glowing recommendation for a big bank loan.
Or heck, just a plain old money transfer. I guess it is an exfiltrating vector of sorts, just not for data ;-) Banks can reverse such transactions of course, but cryptocurrency transactions not so much.
Doesn't this inherent problem just come down to classic computational limits, and problems that have been largely considered impossible to solve for quite a long time; between determinism and non-determinism.
Can you ever expect a deterministic finite automata to ever solve problems that are within the NFA domain? Halting, Incompleteness, Undecidability (between code portions and data portions). Most posts seem to neglect the looming giant problems instead pretending they don't exist at first, and then being shocked when the problems happen. Quite blind.
Computation is just math, probabilistic systems fail when those systems have a mixture of both chaos and regularity, without determinism and its related properties at the control level you have nothing bounding the system to constraints so it functions mathematically (i.e. determinism = mathematical relabeling), and thus it fails.
People need to be a bit more rational, and risk manage, and realize that impossible problems exist, and just because the benefits seem so tantalizing doesn't mean you should put your entire economy behind a false promise. Unfortunately, when resources are held by the few this is more probabistically likely and poor choices greatly impact larger swathes than necessary.
As a mechanical engineer by background, this article feels weak. Yes it is common to “throw more steel at it” to use a modern version of the sentiment, but that’s still based on knowing in detail the many different ways a structure can fail. The lethal trifecta is a failure mode, you put your “steel” into making sure it doesn’t occur. You would never say “this bridge vibrates violently, how can we make it safe to cross a vibrating bridge”, you’d change the bridge to make it not vibrate out of control.
Sometimes I feel like the entire world has lost its god damn mind. To use their bridge analogy, it would be like if hundreds of years ago we developed a technique for building bridges that technically worked, but occasionally and totally unpredictability, the bottom just dropped out and everyone on the bridge fell into the water. And instead of saying "hey, maybe there is something fundamentally wrong with this approach, maybe we should find a better way to build bridges" we just said "fuck it, just invest in nets and other mechanisms to catch the people who fall".
We are spending billions to build infrastructure on top of technology that is inherently deeply unpredictable, and we're just slapping all the guard rails on it we can. It's fucking nuts.
no one wants to think about security when it stands in the way of the shiny thing in front of them. security is hard and boring, it always gets tossed aside until something major happens. When large, news worthy, security incidents start taking place that affects the stock price or lives and triggers lawsuits it will get more attention.
The issue that I find interesting is the answer isn't going to be as simple as "use prepared statements instead of sql strings and turn off services listening on ports you're not using", it's a lot harder than that with LLMs and may not even be possible.
If LLMs are as good at coding as half the AI companies claim, if you allow unvetted input, you're essentially trying to contain an elite hacker within your own network by turning off a few commonly used ports to the machine they're currently allowed to work from. Unless your entire internal network is locked down 100% tight (and that makes it REALLY annoying for your employees to get any work done), don't be surprised if they find the backdoor.
In CS most security issues are treated separately from the fundamental engineering core. From the sw engineering standpoint the bridge is solid, if later some crooks can use it to extort users or terrorists can easily "make people fall into the water", then that's someone else's job downstream.
I know, it sucks. But that's how the entire web was built. Everyday you visit websites from foreign countries and click on extraneous links on HN that run code on your machine, next to a browser tab from your bank account, and nobody cares because it's all sandboxed and we really trust the sandboxing even though it fails once in a while, has unknown bugs, or simply can be bypassed all together by phishing or social engineering.
When a byline starts with "coders need to" I immediately start to tune out.
It felt like the analogy was a bit off, and it sounds like that's true to someone with knowledge in the actual domain.
"If a company, eager to offer a powerful ai assistant to its employees, gives an LLM access to untrusted data, the ability to read valuable secrets and the ability to communicate with the outside world at the same time" - that's quite the "if", and therein lies the problem. If your company is so enthusiastic to offer functionality that it does so at the cost of security (often knowingly), then you're not taking the situation seriously. And this is a great many companies at present.
"Unlike most software, LLMs are probabilistic ... A deterministic approach to safety is thus inadequate" - complete non-sequitur there. Why if a system is non-deterministic is a deterministic approach inadequate? That doesn't even pass the sniff test. That's like saying a virtual machine is inadequate to sandbox a process if the process does non-deterministic things - which is not a sensible argument.
As usual, these contrived analogies are taken beyond any reasonable measure and end up making the whole article have very little value. Skipping the analogies and using terminology relevant to the domain would be a good start - but that's probably not as easy to sell to The Economist.
https://www.quora.com/Why-does-The-Economist-sometimes-have-...
Wait, the only way they suggest solving the problem by rate limiting and using a better model?
Software engineers figured out these things decades ago. As a field, we already know how to do security. It's just difficult and incompatible with the careless mindset of AI products.
> As a field, we already know how to do security.
Well, AI is part of the field now, so... no, we don't anymore.
There's nothing "careless" about AI. The fact that there's no foolproof way to distinguish instruction tokens from data tokens is not careless, it's a fundamental epistemological constraint that human communication suffers from as well.
Saying that "software engineers figured out these things decades ago" is deep hubris based on false assumptions.
> The fact that there's no foolproof way to distinguish instruction tokens from data tokens is not careless
Repeat that over to yourself again, slowly.
> it's a fundamental epistemological constraint that human communication suffers from as well
Which is why reliability and security in many areas increased when those areas used computers to automate previously-human processes. The benefit of computer automation isn’t just in speed: the fact that computer behavior can easily be made deterministically repeatable and predictable is huge as well. AI fundamentally does not have that property.
Sure, cosmic rays and network errors can compromise non-AI computer determinism. But if you think that means AI and non-AI systems are qualitatively the same, I have a bridge to sell you.
> Saying that "software engineers figured out these things decades ago" is deep hubris
They did, though. We know how to both increase the likelihood of secure outcomes (best practices and such), and also how to guarantee a secure behavior. For example: using a SQL driver to distinguish between instruction and data tokens is, indeed, a foolproof process (not talking about injection in query creation here, but how queries are sent with data/binds).
People don’t always do security well, yes, but they don’t always put out their campfires either. That doesn’t mean that we are not very sure that putting out a campfire is guaranteed to prevent that fire burning the forest down. We know how to prevent this stuff, fully, in most non-AI computation.
>software engineers figured out these things decades ago
its true, when engineers fail in this, its called a mistake, and mistakes have consequences unfortunately. If you want to avoid responsibility for mistakes, then llms are the way to go.
> Software engineers figured out these things decades ago.
Well this is what happens when a new industry attempts to reinvent poor standards and ignores security best practices just to rush out "AI products" for the sake of it.
We have already seen how (flawed) standards like MCPs were hacked immediately from the start and the approaches developers took to "secure" them with somewhat "better prompting" which is just laughable. The worst part of all of this was almost everyone in the AI industry not questioning the security ramifications behind MCP servers having direct access to databases which is a disaster waiting to happen.
Just because you can doesn't mean you should and we are seeing how hundreds of AI products are getting breached because of this carelessness in security, even before I mentioned if the product was "vibe coded" or not.
> As a field, we already know how to do security
Uhhh, no, we actually don't. Not when it comes to people anyway. The industry spends countless millions on trainings that more and more seem useless.
We've even had extremely competent and highly trained people fall for basic phishing (some in the recent few weeks). There was even a highly credentialed security researcher that fell for one on youtube.
Would LLMs help with that? Seems like they could be phished as well.
Also, there’s a difference between “know how to be secure” and “actually practice what is known”. You’re right that non-AI security often fails at the latter, but the industry has a pretty good grasp on how to secure computer systems.
AI systems do not have a practical answer to “how to be secure” yet.
I like using Troy Hunt as an example of how even the most security conscious among us can fall for a phishing attack if we are having a bad day (he blamed jet flag fatigue): https://www.troyhunt.com/a-sneaky-phish-just-grabbed-my-mail...
Original @simonw article here:
https://simonw.substack.com/p/the-lethal-trifecta-for-ai-age...
https://simonwillison.net/2025/Aug/9/bay-area-ai/
Discussed:
https://news.ycombinator.com/item?id=44846922
The trifecta:
> LLM access to untrusted data, the ability to read valuable secrets and the ability to communicate with the outside world
The suggestion is to reduce risk by setting boundaries.
Seems like security 101.
It is, but there's a direct tension here between security and capabilities. It's hard to do useful things with private data without opening up prompt injection holes. And there's a huge demand for this kind of product.
Agents also typically work better when you combine all the relevant context as much as possible rather than splitting out and isolating context. See: https://cognition.ai/blog/dont-build-multi-agents — but this is at odds with isolating agents that read untrusted input.
The external communication part of the trifecta is an easy defense. Don't allow external communication. Any external information that's helpful for the AI agent should be available offline, be present in its model (possibly fine tuned).
Sure, but that is as vacuously true as saying “router keeps getting hacked? Just unplug it from the internet.”
Huge numbers of businesses want to use AI in the “hey, watch my inbox and send bills to all the vendors who email me” or “get a count of all the work tickets closed across the company in the last hour and add that to a spreadsheet in sharepoint” variety of automation tasks.
Whether those are good ideas or appropriate use-cases for AI is a separate question.
It is security 101 as this is just setting basic access controls at the very least.
The moment it has access to the internet, the risk is vastly increased.
But with a very clever security researcher, it is possible to take over the entire machine with a single prompt injection attack reducing at least one of the requirements.
LLMs don't make a distinction between prompt & data. There's no equivalent to an "NX bit", and AFAIK nobody has figured out how to create such an equivalent. And of course even that wouldn't stop all security issues, just as the NX bit being added to CPUs didn't stop all remote code execution attacks. So the best options we have right now tend to be based around using existing security mechanisms on the LLM agent process. If it runs as a special user then the regular filesystem permissions can restrict its access to various files, and various other mechanisms can be used to restrict access to other resources (outgoing network connections, various hardware, cgroups, etc.). But as long as untrusted data can contain instructions it'll be possible for the LLM output to contain secret data, and if the human using the LLM doesn't notice & copies that output somewhere public the exfiltration step returns.
> AFAIK nobody has figured out how to create such an equivalent.
I'm curious if anybody has even attempted it; if there's even training data for this. Compartmentalization is a natural aspect of cognition in social creatures. I've even known dogs to not to demonstrate knowledge of a food supply until they think they're not being observed. As a working professional with children, I need to compartmentalize: my social life, sensitive IP knowledge, my kid's private information, knowledge my kid isn't developmentally ready for, my internal thoughts, information I've gained from disreputable sources, and more. Intelligence may be important, but this is wisdom -- something that doesn't seem to be a first-class consideration if dogs and toddlers are in the lead.
There's an interesting quote from the associated longer article [1]:
> In March, researchers at Google proposed a system called CaMeL that uses two separate LLMs to get round some aspects of the lethal trifecta. One has access to untrusted data; the other has access to everything else. The trusted model turns verbal commands from a user into lines of code, with strict limits imposed on them. The untrusted model is restricted to filling in the blanks in the resulting order. This arrangement provides security guarantees, but at the cost of constraining the sorts of tasks the LLMs can perform.
This is the first I've heard of it, and seems clever. I'm curious how effective it is. Does it actually provide absolute security guarantees? What sorts of constraints does it have? I'm wondering if this is a real path forward or not.
[1] https://www.economist.com/science-and-technology/2025/09/22/...
I wrote at length about the CaMeL paper here - I think it's a solid approach but it's also very difficult to implement and greatly restricts what the resulting systems can do: https://simonwillison.net/2025/Apr/11/camel/
Thank you! That is very helpful.
I'm very surprised I haven't come across it on HN before. Seems like CaMeL ought to be a front-page story here... seems like the paper got 16 comments 5 months ago, which isn't much:
https://news.ycombinator.com/item?id=43733683
"And that means AI engineers need to start thinking like engineers, who build things like bridges and therefore know that shoddy work costs lives."
"AI engineers, inculcated in this way of thinking from their schooldays, therefore often act as if problems can be solved just with more training data and more astute system prompts."
> AI engineers need to start thinking like engineers
By which they mean actual engineers, not software engineers, who should also probably start thinking like real engineers now that our code’s going into both the bridges and the cars driving over them.
Engineering uses repeatable processes to produce expected results. Margin is added to quantifiable elements of a system to reduce the likelihood of failures. You can't add margin on a black box generated by throwing spaghetti at the wall.
You can. We know the properties of materials based on experimentation. In the same way, we can statistically quantify the results that come out of any kind of spaghetti box, based on repeated trials. Just like it's done in many other fields. Science is based on repeated testing of hypotheses. You rarely get black and white answers, just results that suggest things. Like the tensile strength of some particular steel alloy or something.
Practically everything engineers have to interact with and consider are equivalent to a software black box. Rainfall, winds, tectonic shifts, material properties, etc. Humans don't have the source code to these things. We observe them, we quantify them, notice trends, model the observations, and we apply statistical analysis on them.
And it's possible that a real engineer might do all this with an AI model and then determine it's not adequate and choose to not use it.
> Engineering uses repeatable processes to produce expected results
this is the thing with LLMs, the response to a prompt is not guaranteed to be repeatable. Why would you use something like that in an automation where repeatability is required? That's the whole point of automation, repeatability. Would you use a while loop that you can expect to iterate the specified number of times _almost_ every time?
What are the kinds of things real engineers do that we could learn from? I hear this a lot ("programmers aren't real engineers") and I'm sympathetic, honestly, but I don't know where to start improving in that regard.
This is off the cuff, but comparing software & software systems to things like buildings, bridges, or real-world infrastructure, there's three broad gaps, I think:
1) We don't have a good sense of the "materials" we're working with - when you're putting up a building, you know the tensile strength of the materials you're working with, how many girders you need to support this much weight/stress, etc. We don't have the same for our systems - every large scale system is effectively designed clean-sheet. We may have prior experience and intuition, but we don't have models, and we can't "prove" our designs ahead of time.
2. Following on the above, we don't have professional standards or certifications. Anyone can call themselves a software engineer, and we don't have a good way of actually testing for competence or knowledge. We don't really do things like apprenticeships or any kind of formalized process of ensuring someone has the set of professional skills required to do something like write the software that's going to be controlling 3 tons of metal moving at 80MPH.
3. We rely too heavily on the ability to patch after the fact - when a bridge or a building requires an update after construction is complete, it's considered a severe fuckup. When a piece of software does, that's normal. By and large, this has historically been fine, because a website going down isn't a huge issue, but when we're talking about things like avionics suites - or even things like Facebook, which is the primary media channel for a large segment of the population - there's real world effects to all the bugs we're fixing in 2.0.
Again, by and large most of this has mostly been fine, because the stakes were pretty low, but software's leaked into the real world now, and our "move fast and break things" attitude isn't really compatible with physical objects.
Right, your number 1 is quite compelling to me - a lack of standard vocabulary for describing architecture/performance. Most programmers I work with (myself included sometimes) aren't even aware of the kinds of guarantees they can get from databases, queues, or other primitives in our system.
On the other hand 3 feels like throwing the baby out with the bathwater to me. Being so malleable is definitely one of the great features of software versus the physical world. We should surely use that to our advantage, no? But maybe in general we don't spend enough energy designing safe ways to do this.
There's a corollary to combination of 1 & 3. Software is by its nature extremely mutable. That in turn means that it gets repurposed and shoehorned into things that were never part of the original design.
You cannot build a bridge that could independently reassemble itself to an ocean liner or a cargo plane. And while civil engineering projects add significant margins for reliability and tolerance, there is no realistic way to re-engineer a physical construction to be able to suddenly sustain 100x its previously designed peak load.
In successful software systems, similar requirement changes are the norm.
I'd also like to point out that software and large-scale construction have one rather surprising thing in common: both require constant maintenance from the moment they are "ready". Or indeed, even earlier. To think that physical construction projects are somehow delivered complete is a romantic illusion.
> You cannot build a bridge that could independently reassemble itself to an ocean liner or a cargo plane.
Unless you are building with a toy system of some kind. There are safety and many other reasons civil engineers do not use some equivalent of Lego bricks. It may be time for software engineering also to grow up.
> 3. We rely too heavily on the ability to patch after the fact...
I agree on all points and to build up on the last: making a 2.0 or a complete software rewrite is known to be even more hazardous. There are no quarantees the new version is better in any regards. Which makes the expertise to reflect more of other highly complex systems, like medical care.
Which is why we need to understand the patient, develop soft skills, empathy, Agile manifesto and ... the list could go on. Not an easy task when you include you are more likely going to also fight shiny object syndrome of yours execs and all the constant hype surrounding all tech.
What concerns me the most is that a bridge, or road, or building has a limited number of environmental changes that can impact its stability. Software feels like it has an infinite number of dependencies (explicit and implicit) that are constantly changing: toolchains, libraries, operating systems, network availability, external services.
That is also something the industry urgently needs to fix to be able to make safe things.
What is the factor of safety on your code?
https://en.wikipedia.org/wiki/Factor_of_safety
Yeah, I think safety factors and concepts like redundancy have pretty good counterparts in software. Slightly embarrassed to say that I don't know for my current project!
Act like creating a merge-request to main can expose you to bankruptcy or put you in jail. AKA investigate the impact of a diff to all the failure modes of a software.
Sounds like suggesting some sort of software engineering board certification plus and ethics certification — the “Von Neumann Oath”? Unethical while still legal software is just extremely lucrative, it seems hard to have this idea take flight.
> can be solved just with more training data
Well, y'see - those deaths of innocent people *are* the training data.
In addition to software "engineers", don't forget about software "architects"
I have been thinking that the appropriate solution here is to detect when one of the legs is appearing to be a risk and then cutting it off if so.
You don’t want to have a blanket policy since that makes it no longer useful, but you want to know when something bad is happening.
Data breaches are hardly lethal. When we’re talking about AI there are plenty of actually lethal failure modes.
If the breached data is API keys that can be used to rack up charges, it's going to cost you a bunch of money.
If it's a crypto wallet then your crypto is irreversibly gone.
If the breached data is "material" - i.e. gives someone an advantage in stock market decisions - you're going to get in a lot of trouble with the SEC.
If the breached data is PII you're going to get in trouble with all kinds of government agencies.
If it's PII for children you're in a world of pain.
Update: I found one story about a company going bankrupt after a breach, which is the closest I can get to "lethal": https://www.securityweek.com/amca-files-bankruptcy-following...
Also it turns out Mossack Fonseca shut down after the Panama papers: https://www.theguardian.com/world/2018/mar/14/mossack-fonsec...
A PII for children data breach at a Fortune 1000 sized company can easily cost 10s of millions of dollars in employee time to fully resolve.
...and a massive fine in the millions on top of that if you have customers that are from the EU.
There are people who have had to move after data breaches exposed their addresses to their stalkers. There's also people who may be gay but live in authoritarian places where this knowledge could kill them. It's pretty easy to see a path to lethality from a data breach.
Jamal Khashoggi having his smartphone data exfiltrated was hardly lethal?
Depends on the data.
> Data breaches are hardly lethal.
They certainly can be when they come to classified military information around e.g. troop locations. There are lots more examples related to national security and terrorism that would be easy to think of.
> When we’re talking about AI there are plenty of actually lethal failure modes.
Are you trying to argue that because e.g. Tesla Autopilot crashes have killed people, we shouldn't even try to care about data breaches...?
In-band signaling can never be secure. Doesn't anyone remember the Captain Crunch whistle?
[dead]