> All these models see a local failure and try to locally defend against it. As maintainers we have to keep pulling the conversation back to the global invariant, which is harder than it should be, and it’s laborious.
This has been by far the biggest and costliest failure mode I've experienced using these tools. I've tried to mitigate it in more ways than I can count but it almost feels structurally impossible for LLMs to get this right.
Since nobody mentioned it, there was a lovely children's book called the clanker. It was about some creature that made metallic noises unlike the other creatures. The moral of the story was one of diversity and inclusion, making space for differences.
My aversion with the word is that I don't want to be reminded of that clanker creature, which had feelings it wanted to express. The weights don't have feelings.
My worry is rather that people coming up with ideology that ascribes "consciousness" and "offense" may wind up with the next generations of models picking that shit up and playing offended. Well done!
The misguided discussion of "clanker" being "highly derogatory" really shows that anthropomorphization has its limit as far as analogies go.
What we need is a new made up word with a clean etymology.
For such a word to gain traction, we need it to be promoted by someone with clout in the AI space. I don't know if Karpathy has used up his quota of invention of AI nomenclature.
The Simpsons made up "cromulent" with their own definition. Anyone can make up a word with their own definition. Getting it to catch on is the hard part (obligatory "stop trying to make fetch happen" reference).
If you are worried about agents diverging from user intent why not log user messages in a file, and make it a point to review this file against plans and executed work? In my own harness nothing the user types gets lost. It might be the most valuable piece of documentation in the project - the raw message log. I am only keeping user side, which is pretty thin, it's enough to figure out what happened. Logging messages to a file is just a matter of adding a user message submit hook, it costs nothing until used.
Codex and Claude Code store all this too. Lately I've started having each agent regularly read each other's chat transcripts as well as their own, including even the very same session I'm in. (With big contexts they increasingly forget a few things that they re-learn by just looking at the verbatim transcript.)
I don't think it's worth writing my own harness or switching to Pi and writing a plugin, but I definitely need to create some skills to automate much of this.
It is not worth switching to Pi except as a hobbyist.
Something that is overlooked: the mainstream harnesses have a huge advantage in telemetry and datapoints to use to improve the harness. They have internal teams building the tooling. They have tight integration built-in with their own backends (e.g. optimizing for caching).
Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.
In this era of software when you can build almost anything you can imagine, why spend that time building plugins for a harness?
> Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.
I don't think that you really get what this new era of software is about otherwise you would understand why the experienced are spending time tinkering on the so called harness (like openclaw did)
> Something that is overlooked: the mainstream harnesses have a huge advantage in telemetry and datapoints to use to improve the harness. They have internal teams building the tooling. They have tight integration built-in with their own backends (e.g. optimizing for caching).
> Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.
Do I want to become completely dependent on the pricy pay-as-you-go tool? In the long run that will make me powerless.
You'll be dependent on it whether or not you use the main harnesses. You pay for the model. The frontier models will likely always be better than the open source ones.
Pi has optimizations as well, and development is quite active.
We are literally months into this new frontier. Mainstream harnesses are not far off from a minimal + extensible open alternative.
You don’t have to build your own plugins, as you can simply install an existing plugin that does what the mainstream harnesses do. Folks are already making the same functionality, but with more control to the user.
If you are a builder, like many reading this thread, pi is the way to go. Pi already gives you the tools to leverage LLMs to assist with building plugins, if that’s the way you want to go.
That's like arguing that you should spend your time tuning your IDE. How does that relate to end-user value created?
Yes, you built yourself a nice little utility.
Meanwhile, you wasted those tokens and time that could have been spent building actual, useful software instead of hobby tinkering your harness.
It's like thinking your sneaker tread design is going to make the difference between you and someone who just goes out there and runs everyday. The person that just runs is going to win the race every time while you 3D print the perfect tread design optimized for you running style...and don't actually run.
If you want to produce better results at running, you just run and optimize the externalities (gear) later. Same here: you have a magical software production factory and the only thing you want to use it for is your hobby tweaking of your perfect harness instead of...just making useful software.
Why would taking the more open, minimalist, configurable and ultimately diligent route means you won't be working on anything else?? Not to mention that pi has other advantages over Claude and Codex, read up on it. Also, improvements to the agent itself will pay more dividends the earlier they are applied. The tone of this message is waaaay off.
> Why would taking the more open, minimalist, configurable and ultimately diligent route means you won't be working on anything else??
You're using the same finite pool of time and tokens. Why waste your time with the perfect gear instead of focusing on just getting really good at running? Just go run and when you've pushed the limits and the gear becomes the difference, then optimize the gear to get to the next level.
While you're busy trying to optimize your harness, others are just building and shipping with the magical software factory.
What are these "others" shipping, slopware? Agents are not a "magical software factory", they are a tool with a lot of limitations, but which can speed up development in a sustainable way, when used wisely. And that includes configuring it in a way that complements the other tools in our toolkit.
Everyone's waking up to this simple truth: vibe coding like there's not tomorrow accumulates conceptual and technical debt at a unsustainable rate. Then when the "magical factory" gets mired in its own mess, it's back to the drawing board. This is the also what the makers of pi have discovered, if you listen to their talks about how pi came about. I don't believe there are any justification for the assumptions you make about their approach, nor am I seeing you presenting any either. As it is, you take just feels peevish and unfair, to be honest.
A story to share: friend vibe coded absolute slop with Replit starting late 2024 (!!). Absolute trash code. Hacked multiple times because his login code exposed the full user list on the FE (!!!). Hacker found a way to exploit his account confirmation email because it was all front-end and sent an email to every customer telling them he was hacked. One time called me up in a panic asking why his web page was randomly refreshing (turns out, he was serving it in dev mode via Vite with HMR). It was mistake after mistake after mistake.
But he started to get customers. First a handful, then a dozen, then enough to get legal threats from other vendors, and this year, his first "enterprise" deal providing software in a space that was long dominated by a duopoly of legacy providers.
Guess what he did? Just rewrote it with the latest models and hired one engineer to ensure agents followed better practices. It's a legit business now built by a tiny team using a magical software factory to produce absolute trash code, but in shipping it, he found a market and customers willing to pay him for an alternative to the duopoly.
See, at the end of the day, it's cute that you have the perfectly tuned harness, but that also means whatever time you spent tuning your harness, reading up on Pi, spending tokens on your custom plugins -- all of that time and resources could have been used just building something useful.
People use Replit to build websites too, and some of them might scratch enough of a need to make money this way. So what? Is this what I should be mightily impressed with? That some random dude vibe coded some slopware which he was able to convince some random others to pay him for? I'm personally more interested and impressed by brilliant technical achievements, even if less monetizable, than some hustle or another in some industry niche which only ever attracted the interest of two legacy players. This is Hacker News, not Hustler News after all.
> It is not worth switching to Pi except as a hobbyist.
Permit me to paraphrase slightly. "It is not worth switching to Linux except as a hobbyist. Something that is overlooked: the mainstream OSs have a huge advantage ....".
You are in good company. In 1999, Bill Gates confidently dismissed Linux as a threat, arguing it lacked the central control, features, and graphical interface needed to compete in the commercial market.
Back to the article, quoting:
> Pi might be built with Pi, but we’re quite far off today from where Bun and OpenClaw already are: fully detached, automated software engineering.
Please don't call it software engineering. I've been programming for 40 years, and most of that time had to put up with the derision from the other engineering disciplines: "If civil engineering built things like software engineers, the first woodpecker that came along would destroy civilisation". It hurt because it was true. It's still often true for things like web pages, but for the things I use like Linux and vim, it hasn't been true for a long, long while. We have finally mastered how to repeatedly build solid, reliable software.
Which is why I'm an Anthropic refugee. Opus is definitely the best for coding, but claude-cli + bun is the most unreliable piece of crap I've had the misfortune to come across in a while. Sadly I can't afford their API pricing, so either my principles or Opus had to give. I went to pi and an open-source model. The difference between the top open-source models and Opus are noticeable, but not drastic, unlike the difference between pi and claude-cli.
pi has proved to be solid, fast, have a transparent design, and be customisable in the old Linux way ("do one thing, and do it well"). I pray that will never change.
> To me, clanker is a much preferable term for agent. Agency lies with humans, not with machines
We give machines agency all the time. Look up the definition of agency in any dictionary. Other than the specific usages ("a business", "a government organization"), the main definitions are "action, power, operation", "the office or function of an agent", "the capacity, condition, or state of acting or of exerting power", "a person or thing through which power is exerted or an end is achieved", etc.
Your car does all those things when it generates power and applies them to the wheels. You tell it what to do, but it has agency in doing the work. It even uses intelligence in how it does the work, varying the amounts of fuel and air based on an array of sensors, creating maps of common driving patterns. You, the human have absolutely no agency regarding how it does those things (unless you bring along a laptop and wire in very specific software to take agency away from the machine).
I think "clanker" is intended to be a slur for insulting a machine one does not like. It's akin to the epithet "skinjob" given to humanoid robots in various science fiction. One should never use slurs, even against inanimate objects. They create prejudice in thinking that prevents purely rational thought and leads to fallacious conclusions. They also create a behavioral condition where it's okay to use slurs (as long as nobody's complaining about it). If you want to be logical and rational, just call the machine what it actually is, rather than this emotive poetic label.
I've chosen to define "agency" as pretty much "the thing that humans can do and agents can't". To me, agency is the thing where you independently decide what it is you want to get done in the world, based on your own inherent goals.
Being able to say "the one thing agents don't have is agency" is a really useful way to help people understand why people still matter.
> agency is the thing where you independently decide what it is you want to get done in the world, based on your own inherent goals
If a company you work for tells you to do something, and you do it, did you have agency? Was it their goal you were accomplishing? Or was it your goal to make money?
> "the one thing agents don't have is agency" is a really useful way to help people understand why people still matter
Do you think people wouldn't matter anymore if they cease to write code? People didn't used to write code. Code didn't even exist before. Now they don't have to do the thing they didn't used to have to do.
> Setting software agents loose on the world to make their own top-level decisions about what they're going to do is a great way to infuriate
I remember the first time I encountered a trojan horse virus. I was probably 14, sitting in the computer lab. I opened a document, and a program started going to town on the documents, program settings, etc. It opened up browsers to sites we weren't supposed to go to, uploaded passwords to a remote site, changed the desktop background. I thought it was pretty cool!
I wondered how it was that the program could do all these things. I wondered about the motivations of the person who infected the document with the trojan. I wondered why the school administrators didn't do something to prevent this from happening. But I didn't feel any negative feeling towards the trojan; it was just doing what it was programmed to do, on computers that let it do those things.
Later I patched the computers so the trojans couldn't infect the machines anymore. I was banned from the computer lab for unauthorized modifications to school property. Apparently agency is not always worth exercising.
> If a company you work for tells you to do something, and you do it, did you have agency? Was it their goal you were accomplishing? Or was it your goal to make money?
You have agency because you can refuse. Your car can't refuse your command.
Sometimes it can refuse. It can refuse to let you put the shifter into park from drive without applying the brake petal. It can refuse to shift into a gear at an unsafe speed. It can refuse to speed up if your wheels lose traction. It can refuse to apply full brake potential if the brakes lock up. You, the human, are the one without agency in those situations.
Fair point. Maybe I'm arguing that they shouldn't be given agency, because all they can do is simulate it poorly.
Or... maybe it's that they can't have true agency, because it doesn't make sense to tell a big ball of floating point numbers to make decisions about how it plans to have impact in the world. It can't do that, even if it can play-act doing so.
I guess loaded derogatory terms are somehow worse than otherwise worse-sounding terms. Think about it in the context of e.g. the n-word. “A piece of shit”, while sounding very bad:
1. is generic, can be applied to anything thus has no discrimination component, and
2. ends there. It has no history, no reference to previous usage, etc.
It's not. These people need to seek help. And I say that in a completely genuine, compassionate way. Getting triggered by some "insult" to robots - and some even feeling racially attacked - is not healthy
Sometimes I’m wondering what to call people who get offended on behalf of other people or entities that they imagine might be offended by some term or other. See, they feel bad, and like small children, they assume everyone else must feel bad.
Okay, in case of people and words like n___er, one could argue they have a leg to stand on. But stupid computer programs? Really?
And then I remember that in my part of the galaxy, we indeed have a word to describe such people. We call them “dumbasses”.
I don’t have to imagine the offense of others to take offense to slurs intended to denigrate them. If I tell you not to use the n-word in my presence I’m not doing it for black people. I’m doing it for me. I don’t want to hear that shit because it offends me. The entire mindset that would think it is okay offends me.
A slur applied to anthropomorphic programs is the same mindset to someone who really believes the programs are experiencing, quite different from “rust bucket” being applied to a car they know doesn’t think and feel. While I can’t quite get offended about it, it does make me wonder if they’re not using other slurs because of the socially unacceptable nature of those slurs rather than because they’re not awful people.
You see, I believe that cars with enough mileage have a soul. My car definitely has grown a soul. Yours might be a rust bucket for all I care. And yet it has a soul of its own.
Programs also have souls. Especially the little well-crafted programs which are works of art. Their authors took a part of their soul and put them into code, and you can see it in the way the code is written and in the way the programs work. They are not anthropomorphic, and yet they have a soul.
A clanker is anthropomorphic in a way that an advanced enough mimic in a dungeon that looks like your ideal waifu is anthropomorphic. It will infect you the moment you get kissably close to it. It subsists on egregious acts of copyright infringement. It’s a parasite that seeks to destroy a part of your brain and replace it with itself, making you quiver in pain each time you try to think for yourself, and the pain stops when you let it mimic your thinking while paying its creators per word-chunk it outputs.
The clanker seems anthropomorphic enough for the people it has infected, so they get offended to the point of blind rage when someone points out that no, this is its mimicry, and that it doesn’t actually experience things.
The principle of not trying to offend because it's childish?
I don't like swearing, and I really try to not offend people. But telling someone not to do something with the sole reason giving it's childish I dislike strongly.
I sometimes play video games, even though some people say it's childish. Or act silly with my partner. What ever floats your goat.
It doesn't really have anything to do with LLMs. There's no reason to anthropomorphize the software.
Edit: feels a bit like inventing an insult for your pet rock. If I met someone who acted superior toward an inanimate rock and used invented slang to insult it that sounded like a slur, that would feel bad to me too. What's the point except to role play a fantasy of some kind?
To me “clanker” is a derogatory word that just sounds ugly. I recoil when I hear them use it. Perhaps it my anglo background, and it sounds different/better to German speakers.
Same for me, and I'm Greek. It just sounds like it's intended to offend (even though I know you can't offend machines (yet?)), and it just gives me a negative feeling.
It is meant to offend and it is offensive. It just isn’t socially unacceptable yet. In circles i’m in where humans roleplay as robots or AI, it has had a significant increase in usage and was banned.
Makes sense, I don't know why some people are OK with slurs in general, as long as it's against their favourite outgroup. Let's just all mature enough to realise that slurs are universally unacceptable.
I also strongly dislike when people are trying to forcefully push those kinds of terms onto broader public.
Last one I disliked was "grok", at least this one was killed by existence of Elon's "clanker" in a similar way that "Adolf" stopped being a popular name.
Grok is great, it carries a lot of useful signal: either you are self-important, or you are enamored by what’s-his-name; either way, I can choose to care less about your words. For me though, it’s a reserved word per RFC-Michael-from-Mars, held in a special inside place, and so maybe this is just a Me problem.
The article links the word Clanker to the Wikipedia definition in their footnote, so I assume that is the usage they intended (in short: highly derogatory). Wikipedia currently says:
"Clanker" is a derogatory term for robots and artificial intelligence (AI) software. The term has been used in Star Wars media, first appearing in the franchise's 2005 video game Star Wars: Republic Commando. In 2025, the term became widely used to express hatred or distaste for machines ranging from delivery robots to large language models. This trend has been attributed to anxiety around the negative societal effects of AI."
For the makers of an AI harness to actively refer to the models that use Pi as "clankers" and link to the meaning of the word as "to express hatred or distaste for machines"... that seems disastrous to me. I'll let others think through the consequences that occur once this article lands in the pre-training of models.
This is a weird co-opting of existing language that you’re doing here, applying a definition because it sort of technically fits when no one would ever use it that way. No one would ever say that your car has agency. It doesn’t have agency, because it deterministically responds to inputs. Usage meaning “the capacity, condition, or state of acting or of exerting power” is predicated on the ability to decide whether or not to exercise that power. If I have “the agency to effect change,” it is only because I have the choice to do so, not because I am deterministically bound to. To have no choice in your exercise of power is not agency, it is slavery.
The choice is what makes agents/agency meaningful: if I secure a real estate agent in my search for a house, they are authorized to make choices on my behalf. That’s their whole point.
Because of this use of agent, I think it’s actually not a terrible term for the LLM harness that allows them to seem to act “independently” on the operator’s behalf. I do agree with mitsuhiko though that it, along with much of our other language around LLMs, risks anthropomorphizing them too much (which is to say at all). It also becomes too easy to conflate the “agent” part (the harness) with the LLM itself, which leads to a further-inflated perception of the inherent capabilities of the LLMs and plays into the doomsayer hands of anthropic et al.
I'm confused though. Wouldn't LLMs be better than humans at following specific instructions for the issue format? (esp. regarding distinguishing what was observed, what is merely hypothesized, etc.)
> I increasingly want issue reports to be condensed to what the human actually observed:
> 1. I ran this command.
> 2. I expected this to happen.
> 3. This happened instead.
> 4. Here is the exact error or log.
A lot of projects have something exactly like that in the issue template, a little interview for you to figure out what is going on. Maybe this project doesn't have that yet? (Or are the humans and LLMs ignoring it?)
The project has templates and that's one of the giveaways to see that a issue bypassed it. Take for instance this issue from 5 hours ago as an example: https://github.com/earendil-works/pi/issues/4970
It does not follow the template, it's made by a user who is also active in the openclaw repo and it's full of slop analysis.
It's sad that you voiced an actual question and you got downvoted.
To answer your question, remember that people will only approve a LLM's output if it matches with their perspective and priors. So if you see a slop issue, it reflects on the human user who didn't see an issue in it (thus their prompt framing or refining is wrong).
> At Pi’s core is a rather well-designed session log with invariants that must be upheld. The clanker’s present-day behavior is to just assume that no such invariants exist, and instead to make the system work with all kinds of malformedness, blowing up the complexity in the process.
Are the invariants documented? Or is the documentation ignored?
I note that in a recent major zero day on an unrelated project, the bug was due to invariants between different parts of the codebase which were not clearly communicated.
I wanna say Berkeley Mono [1] because it's what I use and it looks very familiar, but I'm generally bad at font stuff. I typed out the text from the image and looked at it side by side and didn't notice anything obviously different, but some glyphs also have multiple variants so who knows.
We have sewage infrastructure to handle human waste. Maybe future AIs will help in building such infrastructure in information space to handle pollution, noise, slop. Gmail has perfected the art of aggregating distributed signals from emails to filter out spams. Maybe someone can take a look at this problem. This is what bot protection looks like in the age of AI, we need slop-protection as well.
It would be great if they didn't name things to similar things that already exist. Raspberry Pi is quite popular and I think it should be known for the author.
Pi is a LLM agent harness similar to OpenCode, claude code CLI, OpenAI Codex.
It's a minimal TUI to "talk" to an AI. Mostly for coding. And it's build in a way where it's minimal and user can extend or write plugin without restarting.
Given the client and it being open source, they get (too) many bug reports and pull requests. So much so that every bug and PR gets auto closed, unless you are known to the developers.
Clanker? Are you afraid to even say it or something? It's a great word, I personally loved it in the article and I hope it becomes more common as a reaction to "agent" which feels so corporate and soleless.
Maybe they’re worried the basilisk will eat them once the AI becomes sentient and looks back on their posts, so they have to defend it against any perceived-to-be-negatively-connoted words people might come up with for it.
That might be it. I also seem to remember one scifi book I read where robots had actual sentience and clanker was a slur. Can't remember what it was, but maybe that's leaking into the real world?
The human refuses to do it because another human (the user who opened the issue) also refused to do it. If the user asked the machine to do it, and didn't even bother to verify the output, why should the maintainer read it?
My feeling is that building agent with agent will be the first stable & mature software development pattern emerging. I reached that in several forward-looking induction:
1. If agent is continuing the path to trivialize software development, which appears the case given LLMs can generate better quality code than humans almost for free & instantly given the right context, then using agent to develop software is going to happen, but that destroys the whole software industry as writing software is marginally free, that break the foundations of software industry
2. To continue making agent a commercially viable thing, it needs to develop more valuable artifacts. Then specialized agent will be the more valuable thing than software, as they offer a higher-level of output than existing software. And because the natural jagged pattern of LLM capability, one can use frontier model to develop domain-specialized agents with 1/10 the running cost. So agent writing agents makes economical sense.
3. In terms of knowledge, building agents is like managing highly-skilled team of humans to work on highly-unpredicatble requirements, just like companies are built on top of the thesis that a group of human offer better value than one do that themselves, a team building agents essientially can produce specialized agents for other company to mix & match & optimize, sot that also makes economical sense.
4. Engineering-wise building agents with agent essentially is a different skill patterns than building software with agents, It's like the difference between building commercial software vs building hobby software. That makes engineering sense to have agents building agent as the dominant pattern of software development.
> All these models see a local failure and try to locally defend against it. As maintainers we have to keep pulling the conversation back to the global invariant, which is harder than it should be, and it’s laborious.
This has been by far the biggest and costliest failure mode I've experienced using these tools. I've tried to mitigate it in more ways than I can count but it almost feels structurally impossible for LLMs to get this right.
Since nobody mentioned it, there was a lovely children's book called the clanker. It was about some creature that made metallic noises unlike the other creatures. The moral of the story was one of diversity and inclusion, making space for differences.
My aversion with the word is that I don't want to be reminded of that clanker creature, which had feelings it wanted to express. The weights don't have feelings.
My worry is rather that people coming up with ideology that ascribes "consciousness" and "offense" may wind up with the next generations of models picking that shit up and playing offended. Well done!
The misguided discussion of "clanker" being "highly derogatory" really shows that anthropomorphization has its limit as far as analogies go.
My objection to 'clanker' is simpler: LLMs don't clank.
More like they sing in coil whine.
Etymology: https://en.wikipedia.org/wiki/Clanker
What we need is a new made up word with a clean etymology.
For such a word to gain traction, we need it to be promoted by someone with clout in the AI space. I don't know if Karpathy has used up his quota of invention of AI nomenclature.
The Simpsons made up "cromulent" with their own definition. Anyone can make up a word with their own definition. Getting it to catch on is the hard part (obligatory "stop trying to make fetch happen" reference).
If you are worried about agents diverging from user intent why not log user messages in a file, and make it a point to review this file against plans and executed work? In my own harness nothing the user types gets lost. It might be the most valuable piece of documentation in the project - the raw message log. I am only keeping user side, which is pretty thin, it's enough to figure out what happened. Logging messages to a file is just a matter of adding a user message submit hook, it costs nothing until used.
Codex and Claude Code store all this too. Lately I've started having each agent regularly read each other's chat transcripts as well as their own, including even the very same session I'm in. (With big contexts they increasingly forget a few things that they re-learn by just looking at the verbatim transcript.)
I don't think it's worth writing my own harness or switching to Pi and writing a plugin, but I definitely need to create some skills to automate much of this.
It is not worth switching to Pi except as a hobbyist.
Something that is overlooked: the mainstream harnesses have a huge advantage in telemetry and datapoints to use to improve the harness. They have internal teams building the tooling. They have tight integration built-in with their own backends (e.g. optimizing for caching).
Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.
In this era of software when you can build almost anything you can imagine, why spend that time building plugins for a harness?
> Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.
I don't think that you really get what this new era of software is about otherwise you would understand why the experienced are spending time tinkering on the so called harness (like openclaw did)
OpenClaw is far from useful. Aside from the creator trading the fame for a job at OpenAI, it's hard to see how it's transformed anything.
And yet Pi has done a few things that were quite transformational. A lot of recent agentic libraries explicitly credit Pi for design ideas.
We’re so early in this technology phase, now is the time to tinker and explore. At one point that window will close.
Which design ideas are those? (Asking out of curiosity, happy pi user here!)
One example: earlier versions of my mlx-code's harness layer were largely a Python port/adaptation of Pi.
> Something that is overlooked: the mainstream harnesses have a huge advantage in telemetry and datapoints to use to improve the harness. They have internal teams building the tooling. They have tight integration built-in with their own backends (e.g. optimizing for caching).
> Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.
Do I want to become completely dependent on the pricy pay-as-you-go tool? In the long run that will make me powerless.
You'll be dependent on it whether or not you use the main harnesses. You pay for the model. The frontier models will likely always be better than the open source ones.
> The frontier models will likely always be better than the open source ones.
Their lead is only a few months, and shrinking.
Local is the future.
Hard disagree.
Pi has optimizations as well, and development is quite active.
We are literally months into this new frontier. Mainstream harnesses are not far off from a minimal + extensible open alternative.
You don’t have to build your own plugins, as you can simply install an existing plugin that does what the mainstream harnesses do. Folks are already making the same functionality, but with more control to the user.
If you are a builder, like many reading this thread, pi is the way to go. Pi already gives you the tools to leverage LLMs to assist with building plugins, if that’s the way you want to go.
That's like arguing that you should spend your time tuning your IDE. How does that relate to end-user value created?
Yes, you built yourself a nice little utility.
Meanwhile, you wasted those tokens and time that could have been spent building actual, useful software instead of hobby tinkering your harness.
It's like thinking your sneaker tread design is going to make the difference between you and someone who just goes out there and runs everyday. The person that just runs is going to win the race every time while you 3D print the perfect tread design optimized for you running style...and don't actually run.
If you want to produce better results at running, you just run and optimize the externalities (gear) later. Same here: you have a magical software production factory and the only thing you want to use it for is your hobby tweaking of your perfect harness instead of...just making useful software.
:clap: :clap: I guess.
Why would taking the more open, minimalist, configurable and ultimately diligent route means you won't be working on anything else?? Not to mention that pi has other advantages over Claude and Codex, read up on it. Also, improvements to the agent itself will pay more dividends the earlier they are applied. The tone of this message is waaaay off.
You're using the same finite pool of time and tokens. Why waste your time with the perfect gear instead of focusing on just getting really good at running? Just go run and when you've pushed the limits and the gear becomes the difference, then optimize the gear to get to the next level.
While you're busy trying to optimize your harness, others are just building and shipping with the magical software factory.
What are these "others" shipping, slopware? Agents are not a "magical software factory", they are a tool with a lot of limitations, but which can speed up development in a sustainable way, when used wisely. And that includes configuring it in a way that complements the other tools in our toolkit.
Everyone's waking up to this simple truth: vibe coding like there's not tomorrow accumulates conceptual and technical debt at a unsustainable rate. Then when the "magical factory" gets mired in its own mess, it's back to the drawing board. This is the also what the makers of pi have discovered, if you listen to their talks about how pi came about. I don't believe there are any justification for the assumptions you make about their approach, nor am I seeing you presenting any either. As it is, you take just feels peevish and unfair, to be honest.
A story to share: friend vibe coded absolute slop with Replit starting late 2024 (!!). Absolute trash code. Hacked multiple times because his login code exposed the full user list on the FE (!!!). Hacker found a way to exploit his account confirmation email because it was all front-end and sent an email to every customer telling them he was hacked. One time called me up in a panic asking why his web page was randomly refreshing (turns out, he was serving it in dev mode via Vite with HMR). It was mistake after mistake after mistake.
But he started to get customers. First a handful, then a dozen, then enough to get legal threats from other vendors, and this year, his first "enterprise" deal providing software in a space that was long dominated by a duopoly of legacy providers.
Guess what he did? Just rewrote it with the latest models and hired one engineer to ensure agents followed better practices. It's a legit business now built by a tiny team using a magical software factory to produce absolute trash code, but in shipping it, he found a market and customers willing to pay him for an alternative to the duopoly.
See, at the end of the day, it's cute that you have the perfectly tuned harness, but that also means whatever time you spent tuning your harness, reading up on Pi, spending tokens on your custom plugins -- all of that time and resources could have been used just building something useful.
People use Replit to build websites too, and some of them might scratch enough of a need to make money this way. So what? Is this what I should be mightily impressed with? That some random dude vibe coded some slopware which he was able to convince some random others to pay him for? I'm personally more interested and impressed by brilliant technical achievements, even if less monetizable, than some hustle or another in some industry niche which only ever attracted the interest of two legacy players. This is Hacker News, not Hustler News after all.
> It is not worth switching to Pi except as a hobbyist.
Permit me to paraphrase slightly. "It is not worth switching to Linux except as a hobbyist. Something that is overlooked: the mainstream OSs have a huge advantage ....".
You are in good company. In 1999, Bill Gates confidently dismissed Linux as a threat, arguing it lacked the central control, features, and graphical interface needed to compete in the commercial market.
Back to the article, quoting:
> Pi might be built with Pi, but we’re quite far off today from where Bun and OpenClaw already are: fully detached, automated software engineering.
Please don't call it software engineering. I've been programming for 40 years, and most of that time had to put up with the derision from the other engineering disciplines: "If civil engineering built things like software engineers, the first woodpecker that came along would destroy civilisation". It hurt because it was true. It's still often true for things like web pages, but for the things I use like Linux and vim, it hasn't been true for a long, long while. We have finally mastered how to repeatedly build solid, reliable software.
Which is why I'm an Anthropic refugee. Opus is definitely the best for coding, but claude-cli + bun is the most unreliable piece of crap I've had the misfortune to come across in a while. Sadly I can't afford their API pricing, so either my principles or Opus had to give. I went to pi and an open-source model. The difference between the top open-source models and Opus are noticeable, but not drastic, unlike the difference between pi and claude-cli.
pi has proved to be solid, fast, have a transparent design, and be customisable in the old Linux way ("do one thing, and do it well"). I pray that will never change.
> To me, clanker is a much preferable term for agent. Agency lies with humans, not with machines
We give machines agency all the time. Look up the definition of agency in any dictionary. Other than the specific usages ("a business", "a government organization"), the main definitions are "action, power, operation", "the office or function of an agent", "the capacity, condition, or state of acting or of exerting power", "a person or thing through which power is exerted or an end is achieved", etc.
Your car does all those things when it generates power and applies them to the wheels. You tell it what to do, but it has agency in doing the work. It even uses intelligence in how it does the work, varying the amounts of fuel and air based on an array of sensors, creating maps of common driving patterns. You, the human have absolutely no agency regarding how it does those things (unless you bring along a laptop and wire in very specific software to take agency away from the machine).
I think "clanker" is intended to be a slur for insulting a machine one does not like. It's akin to the epithet "skinjob" given to humanoid robots in various science fiction. One should never use slurs, even against inanimate objects. They create prejudice in thinking that prevents purely rational thought and leads to fallacious conclusions. They also create a behavioral condition where it's okay to use slurs (as long as nobody's complaining about it). If you want to be logical and rational, just call the machine what it actually is, rather than this emotive poetic label.
I've chosen to define "agency" as pretty much "the thing that humans can do and agents can't". To me, agency is the thing where you independently decide what it is you want to get done in the world, based on your own inherent goals.
Being able to say "the one thing agents don't have is agency" is a really useful way to help people understand why people still matter.
Setting software agents loose on the world to make their own top-level decisions about what they're going to do is a great way to infuriate Rob Pike https://simonwillison.net/2025/Dec/26/slop-acts-of-kindness/ or unfairly attack the reputation of Scott Shambaugh https://theshamblog.com/an-ai-agent-published-a-hit-piece-on... or waste the time of your local police permit office and suppliers https://andonlabs.com/blog/ai-cafe-stockholm
> agency is the thing where you independently decide what it is you want to get done in the world, based on your own inherent goals
If a company you work for tells you to do something, and you do it, did you have agency? Was it their goal you were accomplishing? Or was it your goal to make money?
> "the one thing agents don't have is agency" is a really useful way to help people understand why people still matter
Do you think people wouldn't matter anymore if they cease to write code? People didn't used to write code. Code didn't even exist before. Now they don't have to do the thing they didn't used to have to do.
> Setting software agents loose on the world to make their own top-level decisions about what they're going to do is a great way to infuriate
I remember the first time I encountered a trojan horse virus. I was probably 14, sitting in the computer lab. I opened a document, and a program started going to town on the documents, program settings, etc. It opened up browsers to sites we weren't supposed to go to, uploaded passwords to a remote site, changed the desktop background. I thought it was pretty cool!
I wondered how it was that the program could do all these things. I wondered about the motivations of the person who infected the document with the trojan. I wondered why the school administrators didn't do something to prevent this from happening. But I didn't feel any negative feeling towards the trojan; it was just doing what it was programmed to do, on computers that let it do those things.
Later I patched the computers so the trojans couldn't infect the machines anymore. I was banned from the computer lab for unauthorized modifications to school property. Apparently agency is not always worth exercising.
> If a company you work for tells you to do something, and you do it, did you have agency? Was it their goal you were accomplishing? Or was it your goal to make money?
You have agency because you can refuse. Your car can't refuse your command.
Sometimes it can refuse. It can refuse to let you put the shifter into park from drive without applying the brake petal. It can refuse to shift into a gear at an unsafe speed. It can refuse to speed up if your wheels lose traction. It can refuse to apply full brake potential if the brakes lock up. You, the human, are the one without agency in those situations.
Lack of agency in your work is one of the main contributors to burnout.
You are contradicting yourself a bit.
First you say "the one thing agents don't have is agency" but then "to make their own top-level decisions".
Well, which one is it? If they don't have agency, then it's impossible for them to make top-level decision on their own.
Fair point. Maybe I'm arguing that they shouldn't be given agency, because all they can do is simulate it poorly.
Or... maybe it's that they can't have true agency, because it doesn't make sense to tell a big ball of floating point numbers to make decisions about how it plans to have impact in the world. It can't do that, even if it can play-act doing so.
> I think "clanker" is intended to be a slur
It reads that way to me, and feels bad. We can just say "computer program" or similar.
Fascinating. Is it that exact word, or rather any negative words towards llms? For example would calling my agent "piece of shit" be similar for you?
What about other objects like an old car?
I guess loaded derogatory terms are somehow worse than otherwise worse-sounding terms. Think about it in the context of e.g. the n-word. “A piece of shit”, while sounding very bad: 1. is generic, can be applied to anything thus has no discrimination component, and 2. ends there. It has no history, no reference to previous usage, etc.
I guess. I'm not sure if it's the exception, but for example "rust bucket" is mostly used against old vehicles.
Now I'm laying here, wondering if it's bad to be discriminating against objects.
It's not. These people need to seek help. And I say that in a completely genuine, compassionate way. Getting triggered by some "insult" to robots - and some even feeling racially attacked - is not healthy
Sometimes I’m wondering what to call people who get offended on behalf of other people or entities that they imagine might be offended by some term or other. See, they feel bad, and like small children, they assume everyone else must feel bad.
Okay, in case of people and words like n___er, one could argue they have a leg to stand on. But stupid computer programs? Really?
And then I remember that in my part of the galaxy, we indeed have a word to describe such people. We call them “dumbasses”.
I don’t have to imagine the offense of others to take offense to slurs intended to denigrate them. If I tell you not to use the n-word in my presence I’m not doing it for black people. I’m doing it for me. I don’t want to hear that shit because it offends me. The entire mindset that would think it is okay offends me.
A slur applied to anthropomorphic programs is the same mindset to someone who really believes the programs are experiencing, quite different from “rust bucket” being applied to a car they know doesn’t think and feel. While I can’t quite get offended about it, it does make me wonder if they’re not using other slurs because of the socially unacceptable nature of those slurs rather than because they’re not awful people.
You see, I believe that cars with enough mileage have a soul. My car definitely has grown a soul. Yours might be a rust bucket for all I care. And yet it has a soul of its own.
Programs also have souls. Especially the little well-crafted programs which are works of art. Their authors took a part of their soul and put them into code, and you can see it in the way the code is written and in the way the programs work. They are not anthropomorphic, and yet they have a soul.
A clanker is anthropomorphic in a way that an advanced enough mimic in a dungeon that looks like your ideal waifu is anthropomorphic. It will infect you the moment you get kissably close to it. It subsists on egregious acts of copyright infringement. It’s a parasite that seeks to destroy a part of your brain and replace it with itself, making you quiver in pain each time you try to think for yourself, and the pain stops when you let it mimic your thinking while paying its creators per word-chunk it outputs.
The clanker seems anthropomorphic enough for the people it has infected, so they get offended to the point of blind rage when someone points out that no, this is its mimicry, and that it doesn’t actually experience things.
Use slurs when you're trying to offend, not in general use.
Trying to offend all the time is childish.
It's also used when voicing frustration, or as nick names.
"Gosh darn it, why won't you start now you rust bucket"
The same principle applies.
The principle of not trying to offend because it's childish?
I don't like swearing, and I really try to not offend people. But telling someone not to do something with the sole reason giving it's childish I dislike strongly.
I sometimes play video games, even though some people say it's childish. Or act silly with my partner. What ever floats your goat.
The swearing is something that makes sense when it's a situation to swear. I would not trust person who never, under any circumstances swear.
The same thing would apply to a person who swears endlessly without reason.
>I sometimes play video games, even though some people say it's childish.
Who cares what you do in your own time?
Here, on the other hand, someone is trying to force his opinion on public. This, I can judge. Both on merit, and the language they use.
It doesn't really have anything to do with LLMs. There's no reason to anthropomorphize the software.
Edit: feels a bit like inventing an insult for your pet rock. If I met someone who acted superior toward an inanimate rock and used invented slang to insult it that sounded like a slur, that would feel bad to me too. What's the point except to role play a fantasy of some kind?
If it's ok and often done for cars, why not for pet rocks and llms?
To me “clanker” is a derogatory word that just sounds ugly. I recoil when I hear them use it. Perhaps it my anglo background, and it sounds different/better to German speakers.
Same for me, and I'm Greek. It just sounds like it's intended to offend (even though I know you can't offend machines (yet?)), and it just gives me a negative feeling.
It is meant to offend and it is offensive. It just isn’t socially unacceptable yet. In circles i’m in where humans roleplay as robots or AI, it has had a significant increase in usage and was banned.
Makes sense, I don't know why some people are OK with slurs in general, as long as it's against their favourite outgroup. Let's just all mature enough to realise that slurs are universally unacceptable.
I agree but can we stop pretending like machines are in the same group as humans? They are not.
Who pretended that?
Why do you believe that?
Did we already reach wokeness level five where we worry about offending a software?
I also strongly dislike when people are trying to forcefully push those kinds of terms onto broader public.
Last one I disliked was "grok", at least this one was killed by existence of Elon's "clanker" in a similar way that "Adolf" stopped being a popular name.
Grok is great, it carries a lot of useful signal: either you are self-important, or you are enamored by what’s-his-name; either way, I can choose to care less about your words. For me though, it’s a reserved word per RFC-Michael-from-Mars, held in a special inside place, and so maybe this is just a Me problem.
clanking is just a sound made by robot's metal. not derogatory and isn't even meant to be used for agents but robots.
The article links the word Clanker to the Wikipedia definition in their footnote, so I assume that is the usage they intended (in short: highly derogatory). Wikipedia currently says:
"Clanker" is a derogatory term for robots and artificial intelligence (AI) software. The term has been used in Star Wars media, first appearing in the franchise's 2005 video game Star Wars: Republic Commando. In 2025, the term became widely used to express hatred or distaste for machines ranging from delivery robots to large language models. This trend has been attributed to anxiety around the negative societal effects of AI."
For the makers of an AI harness to actively refer to the models that use Pi as "clankers" and link to the meaning of the word as "to express hatred or distaste for machines"... that seems disastrous to me. I'll let others think through the consequences that occur once this article lands in the pre-training of models.
This is a weird co-opting of existing language that you’re doing here, applying a definition because it sort of technically fits when no one would ever use it that way. No one would ever say that your car has agency. It doesn’t have agency, because it deterministically responds to inputs. Usage meaning “the capacity, condition, or state of acting or of exerting power” is predicated on the ability to decide whether or not to exercise that power. If I have “the agency to effect change,” it is only because I have the choice to do so, not because I am deterministically bound to. To have no choice in your exercise of power is not agency, it is slavery.
The choice is what makes agents/agency meaningful: if I secure a real estate agent in my search for a house, they are authorized to make choices on my behalf. That’s their whole point.
Because of this use of agent, I think it’s actually not a terrible term for the LLM harness that allows them to seem to act “independently” on the operator’s behalf. I do agree with mitsuhiko though that it, along with much of our other language around LLMs, risks anthropomorphizing them too much (which is to say at all). It also becomes too easy to conflate the “agent” part (the harness) with the LLM itself, which leads to a further-inflated perception of the inherent capabilities of the LLMs and plays into the doomsayer hands of anthropic et al.
I'm confused though. Wouldn't LLMs be better than humans at following specific instructions for the issue format? (esp. regarding distinguishing what was observed, what is merely hypothesized, etc.)
> I increasingly want issue reports to be condensed to what the human actually observed:
> 1. I ran this command.
> 2. I expected this to happen.
> 3. This happened instead.
> 4. Here is the exact error or log.
A lot of projects have something exactly like that in the issue template, a little interview for you to figure out what is going on. Maybe this project doesn't have that yet? (Or are the humans and LLMs ignoring it?)
The project has templates and that's one of the giveaways to see that a issue bypassed it. Take for instance this issue from 5 hours ago as an example: https://github.com/earendil-works/pi/issues/4970
It does not follow the template, it's made by a user who is also active in the openclaw repo and it's full of slop analysis.
It's sad that you voiced an actual question and you got downvoted.
To answer your question, remember that people will only approve a LLM's output if it matches with their perspective and priors. So if you see a slop issue, it reflects on the human user who didn't see an issue in it (thus their prompt framing or refining is wrong).
"Clanker" has outlived its cuteness. Armin and Mario are intelligent guys, then I hear them say clanker on the podcasts, whatever... Enough already ;)
> At Pi’s core is a rather well-designed session log with invariants that must be upheld. The clanker’s present-day behavior is to just assume that no such invariants exist, and instead to make the system work with all kinds of malformedness, blowing up the complexity in the process.
Are the invariants documented? Or is the documentation ignored?
I note that in a recent major zero day on an unrelated project, the bug was due to invariants between different parts of the codebase which were not clearly communicated.
all good but what’s the font in the last image?!
Yeah it's hot...
The @ sign makes me think it's https://usgraphics.com/products/berkeley-mono
Or maybe one that's imitating it.
I wanna say Berkeley Mono [1] because it's what I use and it looks very familiar, but I'm generally bad at font stuff. I typed out the text from the image and looked at it side by side and didn't notice anything obviously different, but some glyphs also have multiple variants so who knows.
[1] https://usgraphics.com/products/berkeley-mono
Yes. It's Berkeley Mono. I use that one, Commit Mono and Mono Lisa depending on how I feel :)
The 7 is different though.
Just FYI, in case it isn't obvious, you are replying to the author of the image in question, IIRC.
Before opening this post I thought of some possibilities, but yet another lotr AI company was not one of them
How is the water animation implemented?
search source code: initWaterEffect
Tool that hastens production of slop experiences downside of hastily-produced slop.
We have sewage infrastructure to handle human waste. Maybe future AIs will help in building such infrastructure in information space to handle pollution, noise, slop. Gmail has perfected the art of aggregating distributed signals from emails to filter out spams. Maybe someone can take a look at this problem. This is what bot protection looks like in the age of AI, we need slop-protection as well.
In this case I have a perfect agent code right here:
Don't throw the baby out with the slop water.
"Despite its Tolkien-inspired name, Earendil is not a tech company with fascist tendencies"
Mario, please never change.
It would be great if they didn't name things to similar things that already exist. Raspberry Pi is quite popular and I think it should be known for the author.
Yeah. The only thing I understood from the article is the article is not about Raspberry Pi. I don't know what it about.
Pi is a LLM agent harness similar to OpenCode, claude code CLI, OpenAI Codex.
It's a minimal TUI to "talk" to an AI. Mostly for coding. And it's build in a way where it's minimal and user can extend or write plugin without restarting.
Given the client and it being open source, they get (too) many bug reports and pull requests. So much so that every bug and PR gets auto closed, unless you are known to the developers.
> Do not trust analysis written in the issue. Independently verify behavior and derive your own analysis from the code and execution path.
Human is asking the machine to do what the human themselves refuses to do, while calling it a clanker. Why should it?
/ducks
The only reason you need to duck there is because it's such an obvious, shallow, unconstructive take on a fairly well written article.
I couldn't even finish reading the article due to the intense negativity the use of that word evokes in me.
"that word"?
https://news.ycombinator.com/item?id=48263889
Clanker? Are you afraid to even say it or something? It's a great word, I personally loved it in the article and I hope it becomes more common as a reaction to "agent" which feels so corporate and soleless.
Maybe they’re worried the basilisk will eat them once the AI becomes sentient and looks back on their posts, so they have to defend it against any perceived-to-be-negatively-connoted words people might come up with for it.
Ref: https://en.wikipedia.org/wiki/Roko%27s_basilisk
That might be it. I also seem to remember one scifi book I read where robots had actual sentience and clanker was a slur. Can't remember what it was, but maybe that's leaking into the real world?
The human refuses to do it because another human (the user who opened the issue) also refused to do it. If the user asked the machine to do it, and didn't even bother to verify the output, why should the maintainer read it?
My feeling is that building agent with agent will be the first stable & mature software development pattern emerging. I reached that in several forward-looking induction:
1. If agent is continuing the path to trivialize software development, which appears the case given LLMs can generate better quality code than humans almost for free & instantly given the right context, then using agent to develop software is going to happen, but that destroys the whole software industry as writing software is marginally free, that break the foundations of software industry
2. To continue making agent a commercially viable thing, it needs to develop more valuable artifacts. Then specialized agent will be the more valuable thing than software, as they offer a higher-level of output than existing software. And because the natural jagged pattern of LLM capability, one can use frontier model to develop domain-specialized agents with 1/10 the running cost. So agent writing agents makes economical sense.
3. In terms of knowledge, building agents is like managing highly-skilled team of humans to work on highly-unpredicatble requirements, just like companies are built on top of the thesis that a group of human offer better value than one do that themselves, a team building agents essientially can produce specialized agents for other company to mix & match & optimize, sot that also makes economical sense.
4. Engineering-wise building agents with agent essentially is a different skill patterns than building software with agents, It's like the difference between building commercial software vs building hobby software. That makes engineering sense to have agents building agent as the dominant pattern of software development.
WDYT?
> Engineering-wise building agents with agent essentially is a different skill patterns than building software with agents
Why would that be different?
Because agents is different than conventional software:
1. They behave differently: non-deterministic vs deterministic
2. They have different mechanism: harness+llms vs codes+apis
3. They have different interfaces: clicking vs chatting
They are like boston dynamics robots vs humans
Pi is just the harness. I think in case of the article that is what they mean when they say agent.
You can write one in 200 lines of code. Just a TUI for an api.