> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.
Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on.
I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself...
Makes me think of how my Claude.md files specifies to use the built in framework code-generators (rails). Those generators are deterministically right every time.
I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
This is tricky since it can and will ignore your md directions. When possible I try to lean on tool call hooks or skills that invoke deterministic scripts. As much as you can remove the "choice" the better though still there's a lot of randomness in how reliably it invokes skills ime.
Hooks are incredibly underused by most people and are the easiest way to establish a first line of defense against bad behavior. Things like blocking tool calls that will read .env file or execute "create or replace table".
A lot of the time if you're copying code from one place to another what you actually want to do is abstract it so you can reuse it in both places.
The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
Nah the codebase is legacy fucked and I cant be bothered to try and optimize business flows without the fear of other stuff breaking.
Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
This is a spicy take, unless the business is willing to face some down time, and I am hired to do exactly what you said, I’d never touch any line of code unless I absolutely have to. Different environments don’t help as much.
We tend to obsess over software quality when it’s the least important thing for a business. It’s just a means to an end.
This is what its about, we have multiple ecom shops running 24/7 and cant simply afford downtime or a change of business flow that maybe doesnt affect shop A and B but definitely affects shop C and D...
- Takes weeks or months to get simple features out the door, and when they're out they're buggy as hell and the bugs never get fixed. Sound familiar?
> I’d never touch any line of code unless I absolutely have to
And this is how legacy code is made. Years of everyone "never touching anything they don't have to" leads to a giant steaming pile of shit.
> unless the business is willing to face some down time
How does a simple refactor cause downtime? I do this kind of stuff all the time and pretty much never cause any downtime. In the very rare cases that prod downtime does occur it's generally not because of some simple code refactor, and we have it back up in no time by just rolling it back. Unless it's not related to the code at all, in which case it also wasn't a refactor that caused it.
Are you some kind of entitled corporate dev that barely has any influence on the codebase? If I fuck up a whole business goes down as I am the only dev there currently. We cant afford that happening.
Also why would I mess with anything claude.md related? I just use the CLI tool. LLM enthusiasts always claim how smart these things are so they should figure it out on their own, you know?
I have full control of my codebase. I'm not afraid to make changes to it because I know what I'm doing.
You would edit Claude.md to say things like what tech the project is using, because that's the entire point of claude.md. It's literally the solution to the exact problem you're complaining about. Any information you want it to know, you put in there and then it knows it. And you can tell Claude to make or update the file for you.
I'm not one of the people telling you how smart LLMs are. I'm telling you how to use it efficiently, by not expecting it to know everything but rather provide the information that it needs in order to be a more useful tool.
> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
An inference only platform selling good open weight model inference without the research overhead could capture a-lot of market for lower size model uses (haiky, gemeni flash). Diffusion-transformers and clever cashing can drop inference even lower, which is improving at a high rate.
The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)
At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).
For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)
So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?
The American big boys are hoping to create "labor as a service" rather than sell tools. You don't hire an accountant that uses Claude, you hire Claude and it just does everything, without the visibility of current agents. They'll need to make it remote and obfuscated to protect their secret sauce from distillation and reverse engineering. It'll be really expensive, and be focused on enabling rich business types and upper managers.
AI may get so commoditized for certain use cases that you will not even be able sell inference at a profit. AI might be bundled in with other services, just like cursor bundles in their own AI model for auto complete with their editor. I.e. cameras might have AI for image recognition bundled in etc.
Agreed, this is where google is really, really set up to win the market. They can combine gemini subscription with a moderately more expensive google workspace and steal MSFTs entire $50 billion enterprise productivity software market. MSFT is quickly trying to get copilot in a good enough state but without TPUs I think itll be tough for them to serve a good enough model at a price people will accept.
Prices can go down while tokens sold increases so that profit increases. The labs number one goal right now is moving past software engineers so that every white collar worker in the country finds ai assistants indispensable. Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.
> Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.
Well, they just rent their hardware, so I'm not so sure. But they'll both be public soon and we should get that breakout in their cost structures, somewhat.
“Any” is a very high bar Unless laws prevent it, I don’t see why a substantial minority wouldn’t buy services from where they can get them at a similar quality and much lower price.
Any IT cost center will send to the lowest bidder. This isn’t intellectual property: it’s annoying shit that is an unwelcome cost of doing business. China might copy our tedious scripts? Will they make a product out of it? Can I buy it and fire my IT staff? Great!
Not everyone using AI is using it to code core value IP.
Looking around their catalogue more, most of their models seem quite outdated, aside from the OpenAI and Anthropic ones (but those get more expensive). I wouldn't willingly pick Bedrock and would instead throw money at OpenRouter, that has both a bunch of providers, as well as almost any model for you to try.
I wonder if I could start a US-based company with good data regulation and just serve open-weight models at a competitive price. I feel like the real barrier is just that most companies willing to adopt AI usage enough to make it worth it at this point don't want to be using inferior models.
Yes, you can. There are multiple inference providers out there. The problem is, it’s hard to beat the Chinese providers in cost. And you also have to compete with frontier model providers’ subsidized offerings.
They charge the exact same prices. So many people in these comments have no idea what they're talking about. Even if they did charge less, nobody is going to deal with the latency of sending requests to China.
edit: Actually American inference providers are cheaper for Chinese models. There's way more competition here because the Chinese aren't idiots and investing every last dollar they have into data centers for llms that don't make money..
By "cost" I think the parent means the provider's own costs, not the cost of inference to the customer. The cost of land, labor, and electricity are significantly lower in China than in the US.
Can you please link me DeepSeekV4 provider that's cheaper than their official offering? And not all tasks require low latency.
Also, there are a lot of competition in China. Like a lot. You might know better than me as well, but although the biggest AI-labs are based in USA, the adoption is weirdly global. Like as a general sense of what's going on - you can see AI-related ads literally everywhere in Tokyo, almost all the time, in every single screen in public.
Deepseek's api platform for V4 Pro is the only example of this, and Deepseek V4 Flash is cheaper (usually) than from Deepseek itself on openrouter via DeepInfra.
Deepseek shot themselves in the foot because they never intended to serve V4 Pro for .80c mm ouput, that was a promotional price that was meant to expire (and still might). They intended for v4 to cost $4.00 per million but Western inference providers drove down the price because they can operate at negative margins to try and push competition out. I can assure you they are losing a ton of money @ ~80cents.
My point is, its Western inference providers that are establishing the floor price of inference. They are willing to operate at a loss in order to put their competition out of business. Chinese providers are typically at or above the prices set by American/western providers if you go looking on the Chinese internet. You aren't going to get deals from China for inference except through this one instance with Deepseek v4 Pro which wasn't even supposed to be permanent pricing.
Of course though they are not necessarily a viable solution for companies with security requirements etc. given it is just a single person project, but they still serve as a proof it can be done.
Here's a free startup idea: operate an open-weight model service, and offer "Verified AI Integrity," which signs the input tokens, the seed for the randomness in selecting outputs, and the model ID, proving that the result of the call to AI was completely "organic" and was not interfered with.
Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.
There are plenty of US-based inference providers available, including AWS, that serve Chinese models at competitive prices (vs frontier US models). They also have lots of usage. Not necessarily for coding, but for other enterprise tasks.
It's called AWS. Bedrock is right there. Price or data policy is never the issue. The models themselves are the problem -- most large US companies are not going to touch them.
Source: directly involved in these discussions. You can downvote as much as you'd like but you can't ignore the facts.
Some suits with no understanding of how LLMs work are scared that the models might hack them, or believe that they'd have to send data to China because they do not know that open models can be run on your own infra.
Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot.
A key point here is open in terms of being able to download and use it, not open as knowing what data and instructions were fed into it when training.
A paranoid part of me thinks that these models are all inherently biased and instructed to be pro CCP, with specific gaps in their training data related to undesirable historic events and political ideas.
The same thing applies to US models. Check out various system prompt leak repos on github. There are also prompt injections by various parallel "alignment" models that pre-process the prompt before it's sent to the main one with questionable guidance.
You'd be surprised how much of bias exists in easily extractable information. Now imagine how much of that happens during training, that you can't easily extract.
So this is largely a moot point. Yes, Chinese models will likely have some weird things injected into them. But so do the US models. Do I care? Not in the slightest. Models are my code monkeys, and if the code leaves my machine, I assume IP is leaked be it a Chinese model that clearly tells me they do use the data, or US models that pinky promise they don't.
There are some objections here saying that some US firms are using Chinese AI providers, but I wonder if any of those are subject to compliance. Large firms that are disproportionately responsible for AI spending are all subject to compliance.
One aspect Paul Kedrosky mentioned recently is the concept of „duration mismatch“. The price per token goes down over time (either because the AI vendor reduces due to competition pressure, or because customers are now incentivized to use older cheaper models). But datacenters are financed through debt, with the assumption their revenue increases over time. Quoting him: „[AI vendors are] paying for a fixed cost with a depreciating commodity“[0].
So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.
When everything is said and done it'll be datacenters in American competing with ones in China that have several times lower electricity prices. Token prices will drop to a level that will be unprofitable for American data centers and they will need to close.
Today's data center GPUs are essentially overclocked, and so at limit of how much the chip materials can physically handle, and therefore degrade over time. For example, GH200s operate at 1W/superchip but the actual safe power is somewhere around 650W which will allow them to function for a decade or more. But that leads to around 15% slowdown and that is unacceptable in today's competition. So current GPUs are destined to be depreciating assets.
In future, we might have fixed cost GPUs but not today.
I would presume the reason they are overclocked is because they are trying to make up for the shortage. In time, the shortage of computing components will be remedied, and tokens produced at lower power pulls will be cheaper.
There are data centers that use and rent out 10 year old server GPUs.
They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.
They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.
Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.
The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.
As long as the demand for GPUs keeps increasing, there are more data centers being built to house them.
When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.
If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?
What do you think are running on the T4 GPUs in AWS? A lot of the use cases I know of for them are mid-level computer vision models that don't need to be frontier level.
I can no longer edit this, but want to expand on my comment.
I've seen those vision researchers want to train on H100s at the time and being told know, wait for the T4s.
I've seen T4s running BERT models for document classification.
When there are enough Blackwells in data centers that H100s are useless for inference by your standards (I don't know if we've arrived there or not yet), there will be people who, say, want to run the Taco Bell ordering chatbot on them. There will be people who have applications that are just fine with Qwen 2.5 who will be happy renting them.
There seems to be this crazy consensus that hyperscalers are going to go into their datacenters and throw away their old GPUs. The reality is they have a ton of paying customers for them.
And there may be insect identification apps from 2019 that say "you know what? H100s have gotten cheap enough I can use a VLLM so the user can describe where they saw the insect too", or the McDonald's website support chatbot developers say "Hey, the bigger cheapers have gotten cheap enough we can upgrade our models to Qwen 2.5".
The frontier level GPUs in e.g. AWS have a huge premium. When the newer generations come out, they will be able to cut prices to a bit of a premium over the operational costs and still make a profit, and there are a ton of down-market customers who will be interested, who aren't willing to try to outbid Anthropic for Blackwells.
except for you know the enterprise customers who won't change their code and will pay to run old inefficent hardware just to keep from dealing with upgrades?
I'd agree. but also that's too scary. and the bottleneck is the massive manual change control process since there's no automation around any of this. :)
Why take risk when you can spend money and take no risk
Yes, even if the hardware is untouched. As technology advances, the power cost per compute cycle goes down. A gpu using old tech costs progressively more to operate compared to the newer models. So its value goes down over time = depreciation.
As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.
In addition to the physical depreciations other comments mentioned I'd also mention that old chips will settle into a low price and then actually go up on a per unit basis if you're trying to buy a significant amount of them. With a limitation on fabrication facilities continuing to pump out older cards is an opportunity cost to the manufacturers that would prefer to be producing newer cards. If you were in a place where you suddenly wanted to buy 10,000 3080s, as an example, I'm not certain if the market could actually fulfill that demand and no one with the ability to increase the available supply to meet that demand actually wants to do so.
Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.
> There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally
I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!
I used to work in datacenters, during spinning disk era we had technicians from vendors basically every couple of days to replace some broken part. When the massive switch to ssd happened instead of having them every couple of days it was 3 or 4 times per month.
Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.
My understanding is that a lot of AI data centers are still heavily relying on spinning HDDs, which is why seagate, western digital are selling more HDDs than ever before.
I assumed the issue was similar to crypto mining, where given finite amounts of space and power it makes sense to always be running the latest and most powerful GPUs instead of keeping older hardware running. There's definitely a secondary market for these GPUs as well.
the hardware itself is still useful, but random failures happen every so often, so if you're trying to run a fixed sized fleet then your fleet shrinks when you can't get spares any more
Chips do deteriorate and fail naturally at datacenter scale or in timescales of decades, though not exactly like on financial reports. Leak current increases or electro-migrations occur at junctions or whatever those words mean.
And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.
They do degrade physically, but the bigger thing is they stop being competitive quickly. Each year or so we see doubling of GPU speeds for the same amount of power.
If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.
So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.
Gradually, and especially when hot. Modern chips are pretty close to the physical limits of how small they can be made, and that means atomic/chemical effects like electromigration are accounted for and determine the lifetime. Every extra 10 degrees Celsius of temperature doubles the speed of chemical reactions.
When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in...
sounds like planned depreciation on Intel's part, they definitely do not design server grade chips for longevity since that would harm their own revenues
It was not planned depreciation, as many chips were failing even before 2 years and this impacted not only PC Builders and Gamers, but also some server infra providers too.
This was simply poor design, it took Intel ages to really figure out what went wrong and "resolve" it.
Chips age and fail with age. You can check hot-carrier injection, bias-temperature instability and electromigration as they are the main aging mechanisms. All if these are a linear function of time but exponentieal of temperature. 90-100C these chips are running at are really tough, so they are likely to fail at couple of percent to 10% range in 2-3 years depending on the margins they have in the design.
The solder joints are notorious to fail at a high rate too.
Depends, the SMD caps spread across the board the tiny ones do start to fail and go out of spec over time. they are a right pain to replace and hard to spot one that has gone out of spec to cause the chip to start crashing.
Can you not just move the epxensive part (the gpu itself) to a new carrier board in that situation? Also isn't most of the cost of the GPU itself the design of the board, not actually making one, esp if you can move the heat sinks around?
BGA Reflow rework is not rocket science, How do you think the PCBA gets assembled in the first place? Its much easier if you dont care about the boards at all and with the huge die sizes on these accelerator chips its worth it to do a board swap
Nothing is stopping them, it's just not worth it: Have a look at e.g. vast.ai's pricing (https://vast.ai/pricing).
The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc...
Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h.
And remember: V100s hours are sometimes sold at 1/10th the price.
If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity.
It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly.
When it was profitable to mine crypto with GPUs people used to sell these miner GPUs on the used market after about two years.
These were about half of the cost of an used GPU just used for gaming. By that pricr, I'd say a GPU kept busy has twice as high a chance of failure after two years of use.
"So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt."
Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.
In order to not un-build the data centers, they at least have to make more than it costs to operate them, and also not have some attractive liquidation value (the land, maybe).
I could imagine something like “inference is done at home or in China, that’s the price to beat” and it’s not worth keeping all those GPUs cool out in Nevada.
But the parent comment was that one of the bigger costs in these data centers was the interest expense on the borrowed money. A restructuring removes or heavily reduces that amount.
The fiber laid during the dotcom bubble never paid back the investors or lenders, but it's still profitably connecting customers all these years later.
It’s true once built the data center can operate right up to a financed data center value of zero. The investors will loose money but the costs of AI will go down as they do
Drugs cost pennies to manufacture after they are researched and make their way through the approval pipeline. There are many generic drug manufacturers who can work off the existing formulas.
The more apt comparison is that LLMs won't be un-trained. Opus 4.8 now exists. Even if Anthropic somehow went bankrupt, that particular asset could, at the very least, be sold for proverbial pennies on the dollar to a "generic" inference provider.
Research does get lost over time. The whole point of the patent system is keeping that from happening; if the drug company goes bankrupt, even if they lose all their internal documentation in the process, hopefully the patents and other public paperwork provides enough information for an unrelated company -- either having acquired the patent rights, or after the patent period ends -- to reconstruct the processes with less investment then the original research.
If a bankrupt AI company maintains enough of a skeleton crew to consolidate and archive its intellectual property it could be sold off to another company, but there are also timelines where it all ends up digital dust in the wind.
> If a bankrupt AI company maintains enough of a skeleton crew to consolidate and archive its intellectual property it could be sold off to another company, but there are also timelines where it all ends up digital dust in the wind.
Only if that skeleton crew had deep deep pockets. If Anthropic closed their doors tomorrow because the market collectively saw that AI was not profitable and so open sourced everything, there wouldn't be any money to train Opus 5.0... it would then have to fall on governments to put money into the hat (which I can't see happening unless it was Europe)
Datacentres aren't the same as infrastructure or research though. All the hardware in them has a finite, useful lifespan. In 10 years time it'll be totally useless
Hardware fails, and also scales out in terms of efficacy to run it as more power efficient, modern hardware turns up. It requires constant investment to keep it useful, and cost efficient
When AI pops, we'll temporarily have some extra compute capacity that will be horrendously uneconomical to run due to the high grid load and low consumer demand, before they get shutdown. There's simply no real use for them at this scale
Those data centers are specifically for AI workloads. Let’s say everything crashes and we now have all the data centers, what do you do with them? GPU are pretty specialized hardware, without AI a data center full of outdated graphics cards isn’t really too valuable.
It’s really not obvious the infrastructure we are building for AI stuff is something that will benefit humanity over time.
Without talking about the fact that bubbles are extremely destructive. Bezos is obviously someone who came out ok from the dotcom bubble but we are talking about something that destroys a lot of value globally. That has real, direct consequences, not just investors losing some money. The US economy is currently only growing because of the AI bet
Has there ever been a market for cloud gaming apart from middle class people with macbooks who casually want to play one particular game but not enough to pay for a whole PC or console?
I have a big beefy gaming PC. I still use cloud gaming from time to time. It means I don't need to juggle so many 100GB installs on my gaming handheld or cheap personal laptop, both of which can sometimes struggle to play actually demanding games. Battery life on those mobile computers are significantly better when cloud streaming a game instead of running computationally demanding games locally. It also makes the friction around trying out a game significantly lower, all I need to do is click play and the game is running instead of having to wait for it to download, play it a bit, decide I don't really like the game, and then uninstall it.
The feature being bundled in with GamePass makes it worth it. I used to VPN home and try and run games remotely, but it was honestly a bit of a pain. Just pressing a button and having the game launch is quite nice.
AI GPUs have terrible graphical capabilities, if at all. They can run shaders, but they are lacking in texture units, rasterization, etc... huge bottleneck here.
These AI "GPUs" are worse for gaming than even the crappiest actual GPUs (with a G as in Graphics). Also, the display drivers won't support them, not officially at least.
> AI accelerators used in DC are not really "graphic cards" any more, you ain't running gaming on it
I think the lighter 40 series cards like L40 still have OK graphics features. But otherwise yeah, after the Ampere generation graphics features went down the drain. The A100 and A40 cards can do graphics well but it already makes no sense in terms of power-to-performance ratio.
AI data centers are being already used at max capacity, aren't they? I have a hard time imagining people would suddenly use AI less than they do as of today, let alone collectively drop it altogether. So the worst case scenario is that they'd need to be auctioned off way under what they'd be worth now, but still for someone to use them for AI.
Inference is much cheaper than training a new model, so running them just for inference is a completely different thing than having to price in the fact that at the moment all of these companies need to compromise between compute for inference and compute for training new models. If no new models were to be trained, and all the compute was inference only, that would change everything when it comes to the overall compute cost of AI.
Dotcom infra buildup is a bad comparison, in that it wasn't even close to being all utilized. The infra was completely overproportional to the day to day usage.
I would day that the dotcom was directionally correct but the timing was wrong. For instance you had pets.com in 1999 but in 2020 you had chewy.com. It's like you had broadcast.com in 2000 but by 2020 you had YouTube that was making more in ad revenue than the next 4 largest competitors.
AI data centers that exist and are operational are running at maximum capacity. That's why you see things like the tiny little data center run by xai showing up as a valuable resource to xai (on the sale side) and anthropic (buy side). It is "only" 300 megawatts and there's a 1.25 billion rent on it per month.
If all these other data centers were anywhere near coming on line, that 300mw data center would be a rounding error not a line item as it is right now.
So someone's signed contracts for way more and way larger data centers, someone's purchased billions in hardware for these not yet operational data centers. I'm wondering how depreciation's going to work on all these assets...
Anyhow, I'm not really sure what "max capacity" is here, nor am I really aware when they're going to be delivering the operational assets that are currently levered to their eyeballs and consuming 1/3rd of the memory made on the planet.
As far as inference vs training, have new gotten radically better than old models or only marginally (at the cost of 10x or more the training costs)?
I imagine the trend for AI usage will go up over the very long term (5-10yrs etc.), but short term how much usage is being propped up by employer's forcing their employees to use it? Or by user's being curious about the novelty but ultimately abandoning it if it doesn't do what they want? It'll be interesting to see what changes as tokenmaxxing disappears.
Big AI investor tells us that investing in AI is good. Oh, the surprise!
Does that invalidate this point? Yes. Because it makes no sense. The big money is not going to R&D but to build infrastructure that will be outdated in 5 years.
Current AI datacenter/model development investment rate is roughly 1T/year. That's a lot. But the US economy is 33T/year. So the investment pays back (roughly) over ten years if, each year, the AI investments increase overall productivity by 0.6%, assuming the AI companies can capture half of the value of that productivity gain.
> „[AI vendors are] paying for a fixed cost with a depreciating commodity“
That's just a confusing way to say you don't think future models will be worth the development costs.
Because if future models are significantly better, why would the price of tokens to access those models deprecate?
The $1T number seems more promises than reality, which is closer to the $300B to $500B level. Still a big number, but between a third and a half of the value used in the popular media.
These are similar numbers to the dotcom bubble. With GDP growth and the percentage of productivity AI contributes staying the same in this scenario this requires regular gains in revenue or growth. If things just stumble, like with most datacenters going unbuilt the bubble will pop.
Companies whose main core competency is writing code were already making up a big chunk of the economy before AI. Also, less wealthy companies were constrained in their use of software by the inability to afford the salaries of talented programmers (and ripoff practices from software consulting companies who in theory could help). Lowering the cost of building software systems ought to unblock a good amount of economic activity as the technology diffuses.
Those companies are certainly writing more code. But It isn’t clear that they are increasing their economic productivity. It could even conceivably have the opposite effect by fueling a race to the bottom.
e.g. an interesting possible canary in this coal mine is that there’s been a 200% increase in the rate of new apps appearing on Apple’s App Store, but it has not been accompanied by a 200% increase in the rate at which people are buying apps.
If the quality of all apps remains high, but if there is an increase of low quality apps it may not necessarily be great for consumers as it becomes difficult to distinguish which are the good and bad quality apps, making it risky to purchase apps.
The AI pundits often seem to apply the logic that code output is directly proportional to revenue and/or profit, and as such it follows that an AI usage increase leads to more code which leads to more revenue.
I don't believe this aligns with the reality of any major company, unless your business is in the literal sense "selling code" your revenue and profit is tangential to the quantity of code you produce. Google is a good example of this: most of their revenue and profit comes from their ad network, which is disconnected from their development productivity and instead heavily reliant on network effects and time in market. If I was a new competitor with infinite AI funds to throw at whatever problem I choose, I can't simply capture their market by developing an exact copy of Google's ad platform. In the same way, Google can't substantially grow their ad network by coding "more" or "better", they still need more customers and consumers to interact with their network to see any increase in revenue.
So it doesn't directly follow that a productivity increase will inherently follow an AI usage increase.
I would go as far as to say writing more Code has almost no impact on their economic productivity. What drives those companies is infrastructure and networks
So far the place where I've seen "more code being written" having a postive effect, has been in paying down tech debt and reduction of overhead. We've rewritten services (bringing multiple microservices back under moduliths) and cut costs. But I'm talking about net-negative code. That's not the point you're making. I agree that puking out 20 new features likely wouldn't gain us more revenue.
I am yet to see that ‘companies with great ideas which simply cannot afford those very expensive developers’. For the most, issue is not programmer costs. Mostly it’s inability to formulate the MVP which makes sense.
‘uber for my industry’ is not a sensible business strategy
Honestly, if you know guys whose bottleneck is pure software dev — please let me know, I have a good, experienced team in Eastern Europe, we can do wonders in product development. But coming up with sensible business ideas and executing on them in the real world is crazy hard and extremely rare.
You are wrong, sir. Their core competency is building out infrastructure and networks to support their software and user base. software is by far the least complicated thing they do.
what makes YouTube YouTube is not the video player it’s the servers that can handle petabytes of uploads a day and billions of views. YouTube software wise, is no different from the 100s of porn websites that are coded by small European teams
But what if it kills current ad-tech as we know it (paying to show ads on random sites without any way to verify that the site is legit), and the flow of ad money for legitimate goods turns back to journalism, magazines and other publications?
That would be half a trillion[1] redirected to regular people just from Google Ads.
The other day I watched a YouTube video on a work machine with no history and got 2 AI generated video ads for scam products before the video played.
An AI generated man talking about his product building journey to make a pressure washer hose that didn't need power (in the AI video it didn't even have a water supply connected!) that was going to be banned in a week because it was too powerful so buy now.
I've seen AI slop before and scam ads before but the combination of the two gave me some real tingly spider-sense that things are going to get worse and that some unethical people will make a lot of money from it so be in no hurry to stop it.
I mean, that says a lot about the kind of crisis out current economy is in. How much longer can the United States Be a world leader when it’s primary function is social media and advertising
A few things, I think you’re missing the point here
- most tasks do not require the latest frontier models, even if they are a magnitude more intelligent (we don’t actually know if that will be the case). Current Gemini flash is cheap, fast, and pretty capable with good guidance for most tasks
- now that companies pay API costs instead of a subscription they will be setting restrictions on token use to not have their budget explode (like Uber in this submission), that’s a strong incentive to NOT use expensive models, and limit their thinking budget
- there is competitive pressure from China and others who can offer very decent performances at a fraction of the token price
- the price of tokens for the frontier models is likely to go up, but the price to access older models is what depreciates! The overall price per token is going down now that we are in a new world where companies understand that token maxing is one of the stupidest concept ever created by humankind.
The cost of power cost increase alone on industry gonna erase all gains from it.
You can't consider it in vacuum. AI takes limited resources. So far it winded up cost on near every consumer electronics that runs an OS, and it winded up cost of energy that is used by the entire industry and every single customer
It's not just the cost of datacenters, it's cost of infrastructure (that given current direction of US govt will just be paid from people's fucking taxes and bills..) and cost of other industries turning outright unprofitable "thanks" to demands of AI
Local privacy respecting inference can be worth it. I use a local model to log everything I do all week to automate my timesheet. I also have it do a bunch of other data tasks. I won't say that larger SOTA models wouldn't do these tasks better than a local model but PII is a concern and my employer wouldn't approve of me just setting tokens on fire everyday to do what I could do myself.
Not at all! My company has 100s of clients and we track time in 6 minute increments. I feed in my browser history, terminal logs, session scripts, calendar, git commits, etc etc into it and voila it produces a highly accurate timesheet in no time flat.
Automating it has been way better for me than the alternative of breaking my flow whenever I'm switching tasks to chart my time, or logging all my hours for the week in one sitting. Different strokes for different folks I suppose.
If you have a good model router, you can route to older, cheaper models that run on older hardware, for simpler tasks. That helps labs extend the economic life of their hardware investments. They will likely fight it at first though as they see it as reducing ASP.
This is why I'm building role-model, a routing protocol and a router runtime: https://role-model.dev/
For better performance of ~equivalent tasks. That's what all the harness tooling people are using does: (often) increasing output quality by significantly increasing token counts.
Today's frontier models will be tomorrows low-end option. I think whatever model you are using today will be less expensive to use a year or two from now.
They aren't going down, but in the meantime they'll cover their ass by bribing their way into the S&P 500 and then use your 60 year old mother's 401k and teacher's pension to fund their risky capital expenditure.
Can anyone expand on this point? I read an article saying that the big AI co's datacentre spend was a bunch of lies because they can't build datacentres at anywhere near the rate they want to.
> they can't build datacenters at anywhere near the rate they want to
That was because the supplies the datacentre needed were constrained - supply-constrained, not end-user demand constrained, so would be in agreement with the GP comment (and the article I read didn't imply anything about lying).
Don't worry, they'll just lobby to ban Chinese models instead to keep their token revenues high.
> Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.
If you do the math, they don't have a choice. If China captures America's AI market it'll cause a major depression. They'll give it the BYD treatment, though it'll be a lot less effective.
Please explain to me how that works. If I download gguf file and run inference with it, how is it collecting and sending data back to China?
This makes no sense, 99% of the people using Chinese models are using them via Western inference providers who are running them and serving them to people over openrouter or whatever. If anyone is stealing your data it would be an American or European inference provider. A model has no ability to send data anywhere.
I don't think they'll offer open models for long. Since they've actually invested in power, cheap chips, cheap memory and can subsidize tokens - they'll keep undercutting big models to capture data forever. Bonus if they remove ridiculous safeguards and China will be unstoppable.
Pretty sure they'll offer them at least so long as it takes to bring OpenAI and Anthropic into insolvency. Why wouldn't they? The Chinese models are way more nimble to train and run, bring in a ton of goodwill globally, and put immense pressure on the VC furnace that is the US AI sector.
And apparently OpenAI and Anthropic think so, too - why else would they try so hard to ban them instead of outcompeting them?
So, have you ever been to China and could hadely found anything familay?
- Oh, they must have been blocked from entering the Chinese market!
But none of that is true. You could see global brands everywhere here — Tesla, Unilever, KFC, Apple, and so on.
---
Or have you ever actually done cross-border trade? Or any international business collaboration? If you had, you’d definitely realize that what’s really stopping you is U.S. legislation.
At least, that was the case with our former U.S. partner
You don't have to remove the safeguards if you can prompt your way around them.
There's a subreddit for people wanting to sex-talk to various models. It just so happens that the same prompt they use to 'jailbreak' SOTA models for sex talks also works if you want to have model write malware, or tell you how to design a highly illegal device.
We can tell that the inferencing costs for many of these models are low enough that these models are being sold close to real costs on the basis that many of them are open weight and available from third party providers who have no incentive to subsidize them.
I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.
They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.
But, yeah, the prices will come down one way or the other.
At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.
I really doubt Deepseek is subsidised. It's roughly the same price everywhere you look. Deepseek is using the Huawei hardware (as far as I managed to understand from various articles) and hence the savings.
Don't know why people keep parroting this, this is incorrect. Chinese electricity prices are equal or slightly cheaper then most of North America. But significant pockets such as those around the Quebec or other hydro plants are significantly cheaper then Chinese power pricing.
Not only that, China may subsidize AI, but so does the US.
Okay interesting. I presume that China also has low cost areas too no? Their grid at least seems more stable. Datacenter construction is more likely to raise prices in the US than there.
Yeah, this argument is bullshit. You can head over to Openrouter and look at the token cost for deepseek-v4-flash and deepseek-v4-pro. They are very competitive on the open market
> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Raise, they are going to raise the prices. We will spend more on AI infrastructure in 2026 and 2027 than the gross sales of the entire global software and services sector. Current pricing is at a major loss for current providers.
There's no way that all AI inference providers are colluding and/or all running at a massive loss, meaning the cheap Chinese model prices must be the real cost it takes to run frontier-class models PLUS their margin.
Look at Deepseek 4 Pro. https://openrouter.ai/deepseek/deepseek-v4-pro/providers
Deepseek and Baidu are subsidising prices but they probably train on inputs.
I have no model training and ZDR in OpenRouter enabled, and the first provider that shows up there is Deepinfra, significantly more expensive than Deepseek.
BUT much cheaper than Sonnet 4.6 and ChatGPT GPT-5.4.
Why would I even pay for deepseek? I get deepseek v4 flash for free with opencode. If I somehow run out of tokens for the day, I can just then on my vpn
How many more months do we need to wait, until big companies realize that flash models work just fine if you:
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?
Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.
This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
The easy decision is to just go with the biggest SOTA model you can afford.
But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.
The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.
It's the pipeline, not the model, that gets you quality at a given token budget.
It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.
They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster
There are plenty of expenses in this order of magnitude that are not tied to direct increases in productivity. I think it may become a serious hiring impediment for companies to be really skimpy on these budgets for example.
If you had a business task to complete that was only possible with ai and it cost you >$1500/month of work, how long would you have to delay the task so that it's cheaper long run to buy hardware and do local models?
$1,500/mo * 14 months = $21,000.
If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.
I agree, outside of the AI bubble, there's a lot of wait-and-see happening in the B2B world right now, I'd say we're currently 6-8 months into that 14 months.
Nearly no one is doing anything that is “only possible with AI”. This doesn’t seem like a relevant calculation. People spend on AI as an investment in their current productivity.
I'm legit annoyed at opus 4.8 at any setting above 4.8.
I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.
> That means each employee's AI spending cap is ~11% of that median compensation package.
Probably better to use the fully-loaded cost of the engineer, which is much higher than their compensation package. The fully-loaded cost is the total cost paid for the labor power of the engineer, and it includes big ticket items such as office space, food, equipment, insurance, payroll tax, fringe benefits, recruiting costs.
If the median compensation package is $330k/year then the median fully loaded cost is probably around $450-500k.
My usual rule of thumb for the US is north of double the received compensation but something in that range sounds reasonable with such high compensation. It's actually really interesting and underappreciated how that fully-loaded cost varies from country to country. Canada (for most salary ranges) is about half again instead of double owing to the insurance portion coming out of income tax rather than being a hidden expense so Vancouver ends up being attractive for trading 160k USD for like 120k CAD in compensation and then also lowering overhead from 100k USD down to like 60k CAD. The savings can be extremely dramatic.
Why would double be a good rule of thumb for typical US SWEs? Most of the costs aren't proportional to salary, and the ones which are aren't anywhere approaching 50%, much less double.
The costs to hire management and "support staff" like TPMs that scale with SWEs that help them meet goals is proportional to SWEs - often that is taken for the higher end fully loaded costs, depending on how you define it. Office space in downtown SF, Mountain View, or Palo Alto costs more than office space for back office workers in Nashville or Utah. Firms that hire SWEs often have fringe benefits like free food etc. and while they may apply to all workers, it tends to go along with hiring lots of SWEs.
But yeah, double is insane. When I saw prices for COBRA from Facebook, it was $3300 a month, and that was god-tier insurance - the insurance benefits were so good they had a custom list of what was covered that was probably way better than anything available on the market (e.g. you want brand name drugs? no problem. You don't want to try both ambien and trazadone before taking a sleep medication doctors actually recommend? No problem - etc.) - but for my needs it was barely better than COBRA costing way less than half. $3300/mo, or even $1200/mo for an entry level ops worker is a lot of their salary, and probably where the double comes from. At SWE compensation most of it ceases to scale.
The fully loaded costs including proportional management costs isn't relevant to the true marginal engineer, but estimates I've gotten from higher-ups definitely factor into engineering decisions about "should we spend engineering time to save money/make more money - how much will doing this thing cost the company" (opportunity costs are also relevant, but usually less grounded, since most projects don't have concrete benefits like "we will save $x/yr in infra costs")
DORAs. Rather than being sedatives, they directly target receptors in your brain that make you think you should sleep. I think the oldest one came out in like 2011.
It's kind of like neuroscientists found the trigger to tell your brain "we're going to do a clean shutdown now, trigger transition to runlevel 0".
Quiviviq, Dayvigo, Belsomra. All still on-patent, so they don't have generics and are pretty expensive (like $1000/mo if your insurance doesn't cover them). A lot of doctors won't recommend them in practice because most of their patients won't yet be able to get them covered.
GoodRX is always worth checking out, a ton of manufacturers will have coupons if you have insurance but they won't cover it.
Ask your doctor about them, look them up in your insurance's formulary to see what's required (e.g. if you have tried both Ambien and Trazadone and can document it), and see what they can do, before writing it off!
The expectation is Belsomra will lose its patent in 2029 and then generic makers can try to get one approved - so it's not that far off!
While the fully burdened cost of an engineer being double his salary sounds suspicious, this is indeed broadly the case. It has been (sometimes significantly) more than double in the case in every US employer where I worked and where I saw both numbers. In one case it was a hair under 3x.
My experience was not with pure software houses; we had some labs, measurement and RF equipment, but even without the hardware component the offices, insurance, admin expenses, HR, janitors, conference travel and so on would easily bump the total employee cost to double the salary. My 2c.
It’s also worth noting that’s the peak benefit. Expect most engineers to not hit those limits on the regular (if at all, since limiting this puts skills in focus again), and that limit to come down over time as the easy processes are automated and humans are re-tasked with harder problems relative to their TC.
This is not a good bellwether for the AI industry, including its adherents. Their growth assumed a level of indispensability that’s not being reflected in hard numbers and real costs, which lends credence to the notion that these IPOs being fast-tracked are meant to try and cash out before the bubble really pops in earnest. There’s no way consuming enterprises are going to pay such insane costs for such minimal uplift in the long run, and the AI companies can’t keep offering subsidized tokens via subscription plans at their current pricing.
I’ve even heard the rule “twice the salary” being used here in EU, but the tax and insurance burden may be higher. All kinds of those are based primarily on total payroll amount.
That number usually includes cost of habitat and others. It's also a stupid number as it is skewed by how much you can squeeze out of your employees. A better number would be to compare it vs revenue per capita.
It is also possible that capping at $1500 will give you ~99% of the benefits. So even with gains that are much higher, a cap could be a rational decision. Also, most decisions, especially around AI aren't exactly rational, so I wouldn't read to much into this number.
Why there are so many people that still believe that AI coding is a fad? It's something that started less than two years ago and companies are already paying thousands per seat. I know one that gives you 5k per month. Which other tool went from nothing to this level of acceptance so quickly?
No disagreement on computing 2.0, but companies spending 3-5k per employee for hardware isn't generally a monthly cost. It's a at the time of hire, and then once every 3 to 5 years after that, for a monthly amortized cost of about $50/employee.
I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.
Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.
I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.
No, but you do want Opus-tier models to do desktop and office software automation (think about people who intensely use Excel and the like). Actually those might take even more tokens that coding in a lot of cases. Why do you think Claude Cowork is successful, and why do you think Codex is leaning so hard into Computer use?
I wonder if you will see app makers begin to open APIs (MCPs) up in ways that replace computer use. Computer use via human interfaces is pretty hacky IME, and if you can use an app that exposes spreadsheets in a way that reduces token costs by 90%.
I'm optimistic that the demand for AI accessibility will drive programmatic interfaces in places where companies were previously reluctant to.
Any kind of rug-pull is a serious concern. Companies are re-orienting their entire development processes around these tools. Sure they can go back, but it will require a much larger and more expensive effort than to transition in the first place.
All companies who make this transition will be more or less at the mercy of model providers.
Two things can be true at the same time. It can be true that this is here to stay. It can also be true that companies are grossly overvalued right now and that the market is irrationally exuberant. This would mean we could both have a crash and also see AI coding be the new future.
Hardware's not generally a subscription, monthly cost though.
You update it for them every 3/4 years (if they're lucky).
It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.
There's some software that can cost $1k or more per seat/month, but it's pretty rare. Big tier ERPs usually fall in the ~$600/seat/moth range, specialty engineering stuff can hit over $1k, Bloomberg terminal, etc. I wonder if what Uber's building with that $1.5k/month/employee is actually delivering the same value that something like an ERP would to the entire org...
I think the right comparison is the invention of the microprocessor. At that time people were grappling with a lot of the same things we are today - would it automate jobs away, would it transform education and the work place, etc.
The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...
I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.
The question you always have to ask is what problems does it directly solve. I personally think most of the current problems in software development and really the world at large are not time-bound problems but alignment issues, and all an LLM can really do there is be some 3rd party oracle that gives you an answer without needing other humans to agree with you.
I agree with you. I think that if we're talking about actual reliable problem solving, we have to be discussing robotic / drone systems. Software is as complex as you want to make it, and always has been.
> The question you always have to ask is what problems does it directly solve
Most directly, human labour. Labour is always a problem for capital. At a certain level of AI competence, businesses don't need to pay humans to complete the work they need doing in order to operate. I don't think anyone would dispute AI competence isn't growing steadily.
I would use these exact facts as a sign that it's maybe not what it seems. It's much too big and too fast to feel stable. It might keep at that level, increase even more, or drop down to a saner level of use / allocation.
I can see a corporate future where tokens are haggled over in department budgets just like any other line item. Some projects will get more of them, other projects will get less of them. "Use AI for everything" will become "use AI economically and build things that outlast our budget for it."
> It might keep at that level, increase even more, or drop down
Bold prediction. :)
I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.
“AI coding is a fad” is not just one big camp of similar-minded people. Different groups have to give up on their pre-existing beliefs in order to be ok with AI coding.
Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.
I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.
What's an int vs a float vs a boolean? What's a function? What's a class? What's a variable? You don't actually need to know the answer to those questions in order to vibe code. That's a lot of priors to update!
Just to go on record, as of today, I’m a big believer that a person that knows all that stuff is much more productive with AI-coding than a person who doesn’t.
I have no idea how we can get people motivated to learn these through trial-and-error when AI coding exists though. I remember the days of spending hours on stupid bugs that AI can resolve within a minute. But I recall learning heavily from those experiences. Oh well…
I honestly feel like my own learning has accelerated after using AI. Simply because now it's so easy to write the same thing in so many different languages, I can e.g. learn pros and cons of each language, which otherwise would have been I think unfathomable to me. I have now created so much stuff I wouldn't have had time to create.
I setup k3s, and tons of what would be otherwise unnecessarily complicated stuff on my laptop for my side projects with additional home servers, smart house stuff. Otherwise k8s and things like that would have been daunting to learn and in theory and without constant professional exposure, etc...
Microservices in Go, Rust, which I didn't have any previous experience with, games in C and other languages. Didn't know anything about low level memory management before. Was just mainly TypeScript person. Just constantly building random fun stuff.
The question is if you already had intuitive understanding of what those things “are”. The languages and systems have been easier to learn once you picked up a couple. Same applies here as well.
The question is, how quickly does a junior with no experience builds intuition without trial and error.
But surely, it's a matter of curiousity? If you are curious you will naturally want to look deeper to understand what is going on. If you are not curious, then you wouldn't have done very well before either.
I like the presentation I heard from a Principal, that AI tools amplify your competence. If you start out incompetent, it'll just allow you to be incompetent with greater scope and (negative) impact.
yes, but a person who doesn't know any of this stuff is infinitely more productive with ai than someone who isn't when it comes to many things.
we've got product folks vibing out prototypes (not shippable but clickable) in our main front end in a few minutes to an hour. This would previously have involved 3 people and several weeks, or a ton of figma and documents to fill in the gaps. This saves weeks to months and lets them really experience the items.
Then they hand it off to someone who knows all that stuff who is also using AI and the impl also gets done faster.
The PMs are either moving infinitely faster, or at least 30x faster and not blocked constantly by others.
basically you're not comparing people who don't know much (tech) with those who do, you're comparing them before and after access to AI.
And, you don't have to vibe code. A competent developer can make great use of AI. I think a developer that can develop the system themselves is the most accelerated user.
When I started I learnt something about coding from VBA macros to automate excel.
Often that started with the macro recorder. Then you worked out what that "recorded" code/sludge did, removed the crud you didn't need or want, improved the logic and so on. I bought books to understand it better. Now you can ask a (different) LLM "what is this? why is it used? How would I?" etc which is probably a faster learning curve than books, newsgroups and old school personal home pages with good info.
I would have been quite surprised when I first used a VBA macro in anger just how far I would go down the rabbit hole. C, asm, verilog, Linux were no part of what I originally signed up for!
Some people will specialise in the equivalent of recording macros and go no further. And this will be fine for code that gets it done but doesn't matter too much in the other dimensions (security, reliability, usefulness without the authors' support, etc.) Much like VBA utilities inside companies that were useful way back when. Other people will want what they produce to be better, even good, and they will learn about floating point [1] and all the rest, much as I did. Probably learn pretty fast too. [2]
[2] Working out how to write an excel vba webserver and using it to collect and and collate summary data from various divisions into reports was seedy as hell, solved the actual business problem (given ridiculous but intractable constraints) and isn't something you can record. We all have stories from a misspent youth that we're simultaneously ashamed and yet somehow proud of.
"as most arguments don't apply to today's world" makes me want to roll my eyes so hard at you. The vast majority of problems we had with building complicated systems are all still just sitting there. People are speedrunning relearning things we've known about software engineering for decades.
The more things change, the more they stay the same.
Between AI and the stock market (which of course relates directly to AI), I’ve lost count of the number of times I’ve heard lately another variation of “this time is different.” Sometimes so close to those words that I wonder why the person speaking them doesn’t feel a bit tingly. Great big warning signs all around.
The examples I gave, and the arguments that usually support them don’t really translate into “building complicated systems”. I was talking about the arguments in support of variable naming flamewars, etc.
I’m not proponent of AI generating everything without any supervision as of now. But willing to change my mind when it gets better.
Most software engineering jobs are not cutting-edge tech, or research, or solving unsolved problems. Integrations, APIs, figma-to-react pipelines, devops and etc. is what people get hired for. All those can be done much faster in the same-or-better quality by an experienced person with the supplement of AI. It’s hard to imagine any company would go against the grain and slow things down on purpose.
So I accept that “nonsense arguments are nonsense”, but with some minor differences of opinion. Naming of things matters insofar as you care as a human to actually conceptualize the system you’re building. You can call all of this stuff minutiae, and on some level I kind of agree, except for the general vibe of _caring about the quality of the stuff you produce_. That is something that still matters whether it “works”. Like, yes you can get an LLM to gen some junk, but _is it any good_ is still something you are in charge of.
As far as “boring systems are boring”, I can tell you from experience that I work on a pretty boring system, and AI is not all that meaningful in terms of its impact, and it’s not for a lack of trying.
Can it help me create a migration and add an endpoint and such? Sure. But those aren’t the hard problems. They never were.
It’s funny that you think the idea of slowing down is such a bad one, but it is another well-established truth. Slow is smooth, and smooth is fast. This notion of break/fixing your way to prosperity by way of 10,000 ill-conceived PRs is a fool’s game.
I'm sorry, you might be right. But this simply doesn't reflect my daily reality. All I can say is, nobody in my org is creating 10,000 PRs. But everyone is using Claude Code for virtually all commits. We've been doing it since about Opus 4.5ish. So far, so good.
Generally we've modified our timelines heavily, systems are working as intended, company is still making money. There are some AI-authored commits that had mistakes that we didn't catch, but I'm sure this could've been an issue even if all were human-authored. I know first-hand multiple other companies who are doing exactly the same thing.
I agree with "slow is smooth, and smooth is fast" for mission critical systems. But super majority of systems are, indeed, not mission critical.
Because companies are betting that this spending will allow them to reduce cost by firing people.
Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.
But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
It's not built up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.
No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.
They get all the glory, but do none of the work.
It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.
Let's be real. Most of the time you ask an LLM "Why did you do it like this?", it responds with something along the lines of "Oops. My bad. You're right to point this out."
You even have a fair chance of getting a response like that when there isn't anything wrong and the question wasn't rhetorical - which perfectly illustrates the level of the genuine understanding LLMs operate at.
I encounter it constantly with the latest models. Claude is particularly prone to it.
> I shouldn’t have said that with confidence
> I got ahead of myself there
> I overstepped, allow me to correct that
It’s wild seeing how often it’s wrong, and I only know it’s wrong because I am an SME or actually reading the sources. Most of my coworkers are not SMEs with what they are asking and do not read the sources.
A huge part of my job now is fixing fuck ups and failures resulting from these slop jockeys who have already moved on to slop up the next task.
This has happened to me, so I put this in my global CLAUDE.md, and it seems to help (I don't remember getting the response you mentioned for awhile now):
**Lead with the answer when asked how/which/whether.** Name the command/mechanism first; a question seeking understanding isn't a go-ahead to execute. Answer, then offer to act.
To adequately validate work you must be at least at the same level, so if you were right (which dunning-kruger suggests unlikely) that would mean your "terrible" average employee is given a tool that will 10x their output which they cannot even check for correctness. And correctness will be low if the average employee is bad like you say, because it means they will give badly specified tasks and even with the best of us it's garbage in, garbage out. I am sure there is no way this can backfire.
All enablers also enable mediocrity. That's not new. At least when the non-mediocre engineer has to work with someone, they can have a tireless responsive partner.
I find this varies by individual, but the AI taking care of so much boilerplate and rote work of coding, and taking the role of architect, test designer, and reviewer is a lot more productive for me. Check the code may take the same skill, but it's an order of magnitude less work.
Perhaps if you need that much boilerplate it's not going to be a well-architected codebase in the first place. Abstract it out, make a lib out of it. Easier to review & test in separation. Loose coupling, high cohesion.
I remember hearing (perhaps last year?) that the model companies have specifically tried to obfuscate the "thinking/reasoning" behind the decisions the models make so as to prevent cheaper models from training on the reasoning logs. So asking one "why did you do it like this" might be not fruitful.
Not sure if that's true or if it might be influencing what you're seeing, but it's a thought.
I think that has to do more with the thinking "train of thought" that some models show as what the model is processing before making the response. There shouldn't be a distillation risk with actually asking the model to explain why it made a decision and getting the response.
That's because of a fundamental misunderstanding of what an LLM is. The only correct answer to "Why did you do it like this?" is that the specific combination of input text and RNG state caused this particular output. There's no reasoning to be had.
* EDIT *
What's with the downvoting? That's a correct description of what happened. You can't ask an LLM why it did something and expect a coherent response, because there's no thinking chain, and no stored thinking state... At best, you can get a reconstruction of how the context relates to the output (basically a summarization of the context).
It's so fucking bad. I'm watching a team try to maintain a huge dashboard/control application that interfaces with a large amount of hardware using solely AI workflows.
Literally nothing works, all the timers/time counters are different across the pages, constantly commands hardware to do stupid shit, breaks during critical moments/in front of clients.
Eventually mgmt had to institute change freezes for high profile events because the team was breaking too much shit all the time.
The average C suite dipshit doesn't realize that the performance drops off a cliff once your project is more than some fraction of the context window so they will make pretty dashboards all day long but once you need to cover all the edge cases of a real system it all explodes.
AI isn't trained on the type of software style we'll need to create systems using AI, it's trained on how we used to write software. It doesn't reuse code or elegantly structure annoying, it just adds more code until the thing builds and passes some fake tests, even if half of it is functionally dead/unused.
> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
There are plenty of valid criticisms or warnings about over-reliance on AI coding, but this is not one of them. Today, I am using a semi-autonomous agentic coding system which has an `interview` functionality built in - when it spits out the PR from the input, if you have questions about the motivation or context for a particular choice, you can start up a clone of the original agent in a sandbox to question it.
Now, you might claim that those responses aren't always reliable, accurate, or consistent, and that claim has a little more weight (though, in my experience, decreasingly so) - but it is _certainly_ not the case that you cannot interview an agent about choices made. I'm literally doing it every day.
>Why there are so many people that still believe that AI coding is a fad?
Because there's not a single piece of evidence that this has improved the quality of the delivered software, or for that matter even the speed of features any of these companies produce, in fact if anything the opposite.
The point of software development, the hint is in the name, is to develop software, not consume tokens. If Uber was now full of 10x engineers the stock price of Uber would be up, not down on a yearly basis. Hilariously enough the only company whose stock price is up appears to be Antrophic
I don't believe that the quality is the best metric for these companies. I doubt that Google has top-notch code quality in every product they developed, but it does not matter if they are making billions per month. Furthermore, I honestly believe that the quality stayed the same, at least.
That's just a non sequitur. "companies are already paying thousands per seat" has zero correlation with something being a fad or not. There are much more reasonable rationales explaining why companies are acting the way they are than "because AI coding is not a fad"
Can you name a service that charged companies thousands/seat/month that turned out to be almost or completely useless? There's lots of random services sold to corporates that are not very useful (all the random benefits besides health care, life insurance, and other big-ticket items), but the per-seat charge of those is much smaller.
Google Jam Board (and other digital whiteboards) had high upfront capex and lowish opex. Probably close to the price for how often they were used before being killed off.
Same with the MS surface(?) tables (not tablets). I saw load of companies buy into the hype and then discard.
Hey I'm a consultant. They pay me to be a regular developer but they cannot hire since they just fired thousands of people which they apparently did need, turns out.
Companies love to waste money on that kind of service, before this website became everything about AI, every week someone would post how they saved a gazillion dollars by leaving vercel or AWS to self hosting as an example.
> Can you name a service that charged companies thousands/seat/month that turned out to be almost or completely useless?
The Concorde turned out to be fad (not "useless" - which was your reframing.) Touted as the future of travel, each seat cost about $20,000 of today's dollars, but it turned out even at those high prices people and companies were willing to pay per-passenger, supersonic trans-Atlantic air travel is not economically viable, and was discontinued.
There is a whole spectrum between "ai coding is a fad" and "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"
Because writing huge amounts of code is easy for humans too. Agents already proved that they can do it. But are agents able to maintain it? I do not know and unless I know for sure, I am not fully committing to AI generated code.
i.e. I am able to write about 1k lines of code of "acceptable" quality per week. Which means in 1 year, there will be about 5Ok LoC. I am pretty sure, that I would have to spent like 60-80% of time to maintain 1st year code and the rest to make new features in the second year so I would have to hire more people and spent time to onboard them to maintain velocity. All of that are rough estimates, probably overoptimistic and way worse in 3rd year. Good luck doing such estimates with code agents. Even worse if you already have huge amounts of legacy code.
It's cope. People desperately want to believe that AI coding is going away so that they can go back to partying like it's 2020.
So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???) or that code bases that AI contributes to will spontaneously combust, or something.
I don't think it is unreasonable to say both will happen, is it?
In the long term, tokens will fall in price. Obviously. (If "tokens" continues to be the unit)
In the short to medium term, for the IPOs to succeed, people have to start actually paying for what they are using, so the price will go up, and is going up, quite a lot. Once their value is set they will slowly fall from that point (or some point maybe halfway, depending on how much the market is willing to continue to subsidise).
I am an AI cynic, but I am now an informed cynic; I am learning agentic tools so I know where they are useful and I know my enemy.
I think the "fad" here is cloud-based, metered AI being a dominant work mode.
Nothing, so far, has suggested to me that any other outcome is likely than edge- to local-scale, on-device, on-laptop, on-prem models getting good enough to the point where people use them by default and use the cloud models only when they need the extra oomph.
I cannot believe that there is anything other than an enormous incentive for companies like Uber to find local, small model and on-premises solutions to their problems, not least while pricing is so changeable and people are getting nasty surprises.
Betting on OpenAI and Anthropic being around over the long term in the form that they are now, that feels like valley hopium. Utility monopolies essentially always derive from physical/geograpical limitations, don't they?
I mean, there's an "enormous incentive" for people to run their own data centers rather than using AWS. And yet, cloud is growing and on-premise is shrinking.
While I hope local AI continues to exist, I'm skeptical that it will take over, for the same reason running your own servers hasn't taken over. It's just hard, and involves spending huge sums of money up front.
It's also not really clear how much tokens are being subsidized. The discussion reminds me of Uber. For years people on HN claimed that Uber was going to collapse once they ran out of VC money. Then... that never happened, and everyone just moved on to discussing other things.
Infrastructure is massively complex and multi cloud is super hard to do. Switching LLMs is... a drop down.
Now, that doesn't mean running your own LLM will be easy, but this will mean it's a lot more likely that there will be at least regional LLMs, in my opinion. I.e. there will be Google, whichever (if any) is left standing of OpenAI or Anthropic, and then there will be Chinese hosted LLMs, probably Indian hosted LLMs, European hosted LLMs, plus LLMs hosted on managed services (i.e. Bedrock). For sure I see large banks on the like being able to host the best OSS or even licensed LLMs on their own cloud infrastructure accounts (i.e. at AWS, Azure, etc).
And that's on top of the LLMs running on owned server infrastructure plus actual local, on device LLMs.
You're using the future tense, but all of those things already exist. Google exists, Amazon Bedrock exists, DeepSeek's cloud product exists, etc. etc. But this isn't relevant to what the post you are replying to said, which is that "cloud-based, metered AI being a dominant work mode [is a] fad". Since all of those things are cloud-based, metered AI.
I was talking more about on-premises, on private cloud and on-device stuff, as I said.
If you look at what Uber is spending per developer per month, they clearly have some headroom to consider whether more-local, unmetered AI tools on device, on premises, in private cloud, can be cost-effectively used to cut down how much money they are pouring into Anthropic and OpenAI. Not least because a bit of centralised effort might lead them to distilled models that are better for their purposes. Some of that budget could go into simply putting a bit more capacity on a developer's desk.
Can they do it now for everything? Obviously not. But IMO there is no reason at all for planning and scaffolding tasks to be done with cloud models, and there are many reasons why it might be better to do document processing without leaving the premises.
The incentives are there on the technical, operations and particularly on the business levels, and the relative disruption of the switch really small, considering that all the tooling can use different models for different tasks already. They must at least be investigating the possibility; it's irresponsible not to.
Token costs do go down over time for sure due to software optimizations (i.e. better attention kernals) but acting like hardware INFLATION isn't happening for at least a few more years is just nonsense. Objectively an A100 is more expensive to rent today than it was in 2024 (a 7 year old GPU - Big short guy is a turbo idiot) and rising. As such, over short time horizons, it's possible to see limited amounts of "price per token goes up" for the same model.
It's a mix. If the current wave of LLM businesses crater, demand for LLM specific hardware (and related hardware) will crater. GPUs were propped up by crypto currencies and now by LLMs. They're still great at doing fundamental math operations, but for their value to stay up another massive business opportunity involving matrix multiplication and the like would need to rise as soon as the current business cycle winds down.
> So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???)
I mean, Github Copilot's pricing just went up considerably, so I guess they were right?
Because we have spent a lot of time and money using AI to generate code and have been unimpressed with the results.
As for why they got accepted so quickly 1) the industry's long running desperation to deskill computer programming 2) the addictive psychology baked into LLMs "That's an elegant solution! Shall I ... ?"
Also, a bucket for VC to put all that NFT, IoT, blockchain, VR investment into. VCs gonna VC and the last 15 years of bets failed so the last few years have been a transition away from those toward "the next thing".
Because the vibe coded stuff is sometimes great, sometimes it breaks stuff, sometimes it breaks things that we fixed multiple times earlier. The PRs are too large, nobody can review that mess and you better be on call for your deployment. Maybe it will get better, maybe not. I dont know yet.
The massive PRs is something that probably has to end. You can ai generate smaller changes in reviewable PR sizes. It probably even helps the AI code review tools to break the work in to smaller logical chunks too.
1500/Mo per engineer is such a small price considering the base salary of these employees, Maybe Uber knows something we don't (the 5X engineering ROI isn't there for them?).
Judging the ROI of an engineer is hard. Adding AI on top of that makes things worse, I think. I've heard AI makes engineers 3X, 5X, 10X and even 100X.
If I told my CEO that I was 4X more effective with AI, I am doubtful he would be willing to spend even 1X my salary on tokens. Even though he would be making out in the end.
So touche, but since it's usage per task it's kind of weird.
This means that the average engineer is efficient at (say) identifying the first 10 tasks they should do but there are diminishing returns after that? That seems like a weird pattern. Wouldn't it be more likely that certain tasks have a ROI based on how efficient the task is generated?
Like I'm trying to imagine in my head, if you think an engineer is more efficient with the tool, why deny them more tokens. I guess so they think to use them more efficiently?
So, maybe I conclude that I think your conclusion that there must be $1500 per engineer is flawed. And even if it were true, I don't think the benefit would be evenly distributed. I suspect this is a first pass at figuring how to budget them and there will be a second pass.
While it certainly reeks of motivated reasoning, Jensen Huang assertion that an expensive engineer should be using at least their salary in tokens feels more logically sound to me (assuming the average engineer is efficient at using tokens, I have a feeling it's a normal distribution)
Setting a cap motivates developers to invest their tokens wisely such as choosing the right models and not burning tokens for fun or side projects, same as any budget.. it’s not any deeper than that.
At my company we can ask for temporary cap limits if it’s justified, which is fairly common.
As a side note, I wonder when we'll hear the first reports about employees reselling (parts of) their token budget.
Probably not worth it risking your job for a 200$/month good, but at 5K, I'm sure some folks will be tempted. Especially if companies do stupid things like token usage leaderboards.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
at their scale they could also just run a large on-premise or rented (basically still cloud, but cheaper) GPU cluster and run through that. fixed costs, even license a SOTA model’s weights if you’d like
> even license a SOTA model’s weights if you’d like
Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
I'm not sure the labs will win either. I wouldn't be surprised to see OpenAI & Anthropic just get acquired, either by Microsoft or Amazon and their models just become another product offering in their public cloud and and some hybrid on-prem offering like Azure Stack HCI or Azure Stack Hub (already basically a "cloud in a black box" that could become "AI in a box")
The problem isn't really Uber, Microsoft or Nvidia, it's all the smaller none IT companies that also have developers on staff. They are screwed. $1500 per seat per month is just way to expensive, but they also can't afford to build and maintain their own on-premise solution. If Microsoft can't afford to run CoPilot for their own developer, what chance does any of their customers stand?
If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
In hcol locations yes, but in south of spain you can get full time talent for that figure. It's also an entry-level salary in eastern europe, with ukraine and turkey even being somewhat cheaper.
So on the lower end that's (1500 USD ~ 1300 EUR) close to half the total expenses of such a developer, on the high end here around 15-20%. That's quite significant, depends on whether their productivity also improves (if that's what the orgs care about).
And we’re not even the country with the worst pay out there, but pay the same for tokens, cause regional pricing isn’t a thing!
Why are smaller non-IT companies "screwed" because they can't pay out the nose for their developers' AI usage? They're non-IT companies, developers are presumably not on their critical path, or not their bottleneck. Developers can keep on writing code the old way, or doing it with a more reasonable AI spend. I don't see how this "screws" any company.
WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
I can say at least for me at a small-ish company (~40 FTE) there has been a surge in internal productivity tools. Nothing to improve the end user product directly but a lot of tools to make processes easier and less error prone.
What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.
About the same ~40 FTE team. We're doing the same thing. Smattering of internal tools, but no net gain in external revenue. Who knows which of those tools will have any value or ppl are just doing it because it's cool now to make fancy dashboards.
Yeah this seems to be a pretty widespread story, from what I've heard as well. The thing about those janky dashboards and spreadsheets though is that somebody understood them and built them with intent to solve a particular problem. Despite the rickety appearance, they're trustworthy tools. A polished single page app might look nicer but it's harder to debug than an excel sheet, and much less transparent in its internal workings--especially if nobody actually wrote it...
Imo its pretty clear that anyone who is taking the issue at least somewhat seriously knows the amount of value they provide is not non-zero. However, the problems are manifold: firstly, toolchains vary wildly, from fancy autocomplete, to engineers chatting with codebases they're unfamiliar with, to people integrating them into devops and infra, to people doing spec driven development, with a thousand philosophies inbetween. Many people suspect that those above them in the ladder are on the cusp of massive failure due to losing track of the code, and many people higher on the ladder think those below them are overly cautious. I hate to be the guy saying "oh it must be somewhere in the middle", but I will say at the very least I like being able to use it to read docs for me, and to synthesize syntax and simple scripts (give me a join that works across these tables and gives me column x, y and z - give me a python script that parses a file like this example and extracts abc data - given this api spec figure out how I can get this data from this endpoint, go)
as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.
I agree the most interesting use cases I've heard of are about increasing the rigor of software development practices, but there's definitely a lack of coherence in methodology.. I believe that some users and companies are successful in this effort, but the odd (and interesting!) thing is that so far we don't seem to know how to communicate how to do it successfully.
There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.
Quite possibly. Doubftul it will happen all at once. If you can get 8 hours of work done in 1 they'd need to ramp up demand 8x. Would be interesting to see that happen over night. Happy monday. Here, take these 30 tickets.
~70 FTE Engineering team. We are shipping more features, especially features that previously would not have survived the cut to make it on the roadmap. Even though we are shipping more, our total amount of escaped bugs has not increased, so our escape rate has actually lowered. On top of that we are able to triage and fix escaped bugs more quickly now. And then of course there has been an uptick in internal tooling that makes the rest of the company more efficient, and we have been able to address tech debt at a higher rate than before.
I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.
And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.
> There was probably a reason it was on the backlog (because it didn't really have value).
There are definitely things in the backlog with low value. We don't work those items, even if we could now. The additional bandwidth we have now goes to valuable features that drive revenue and retention metrics. The reason they were on the backlog were because we just didn't have the bandwidth to execute on them well and they were just somewhat less valuable than the critical path items on the roadmap.
You can ask the same for the median 330k salary in the US for Uber Engineering...
and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
This is what all "platform engineers" have to do once things are working nicely: you have to keep inventing work.
I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
But most Platform Engineering teams in smaller companies (and especially non-US) add a layer on top of existing technologies. A layer that usually maps to the specific culture and idiosyncrasies of that company; a bit like the deployment flow which is usually very specifically shaped on how a company is.
But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
> You can ask the same for the median 330k salary in the US for Uber Engineering
People DO.
It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
The massive misalignment in large companies is no secret. But neither is the fact that when someone comes to cut, they also have no idea of who is doing load bearing work that matters, and who doesn't. I look at recent cuts around my large corp, and it's clear they are made at levels that have no visibility of the ground, and are uninterested in said visibility. Obvious mistakes that are worse than what claude would have told you (yes, I asked Claude to pretend to make the budget cuts in our org y looking at the same data an exec could probably get. They were better than what happened)
I think it's a general problem, but in my rare conversations with execs nowadays, they seem rather uninterested in improving their decision making there. The actual performance of the organization does not appear to be all that relevant to them.
This is a very good answer but there's a flip side too.
The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
Sure, but has their rate of value added increased as a result? It's a good question to ask. They added value before LLM coding, and now are more expensive than before thanks to token costs.
What's the point of running it locally though? Inference for open models is quite cheap already. They could just selfhost, anyway. The experience of running LLMs locally will be excruciatingly bad in comparison at least for the near future.
Right - the future of LLMs is like ol' windows XP+Dell. Commercialized "things" you run locally offline, co-designed with hardware, with a known productivity suite, and large businesses building the next generation thing and suite with 18mo release cycles (ish).
XP? I can see the argument for enterprise support but in that case the latest windows OS is going to be virtually free and I dont know if MS and Dell etc. would even support an XP machine. Might even be required for hardware. If no enterprise support wouldnt Linux make a lot more sense?
I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).
"Windows XP+Dell" should have been in quotes. It's similar to the way enterprise productivity software was developed, packaged co-designed with hardware, and sold on an 18mo upgrade cycle assumption. It's not literally windows xp.
I don't see it. Leasing equipment and paying per seat license fees makes a lot of accounting and cash flow sense. Maybe when it gets to the point where you can run SOTA LLMs on consumer hardware. But that seems a solid decade and probably much more away.
Even then it makes more sense to rent the bigger GPU and get your answer faster.
There's waayyyy too much money betting on that not happening, to the point I feel there'll be regulations popping up for "safety reasons" etc to ensure the big players control this.
3/4 of Microsoft's BUILD conference the past two days were about local AI, foundry local and Windows ML along with a big section in the keynote about running local workloads on their new hardware with Nvidia. Say what you want about Microsoft's reputation, but they are a "big player" and seem to be moving in the direction of local AI first.
Your last question is really important. What did they accomplish with all that spend?
I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
If you believe a 128gb machine that is essentially DGX Spark in a laptop chassis can run models comparable to SOTA you either never ran open models on hard tasks, or you aren't scratching the surface of SOTA closed LLM capability in how you're using them.
Can you show me an example of a hard task that can't be achieved using light models? When we don't want the model to work on autopilot without reviewing the code at all. Even SOTA models will produce garbage code, if you don't guide them all the time.
Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.
You can't get an edge using local models, these guys may have competitors that will spend on SOTA models. They won't likely ever consider local machines even for some offloading scenarios, the complexity and costs will be even higher.
Consider rewiring your perspective: getting an edge doesn't really matter; the only thing that matters is will customers pay for this? Is this a useful, valuable problem to solve?
Coding faster doesn't really solve that.
Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.
I am wondering more and more if this becomes true as these smaller models take off. I might be old fashioned but I have yet to crack the workflows some of the hype people spout like Claude codes Boris where he and others talk about running hundreds of agents overnight.
I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
That's because for some of these folks, the cost of the tokens doesn't have to match the value of the output; the hype from the story is all they need.
Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............
$18K a year is a fraction of the salary of a junior engineer.
Claude has allowed me to do refactors that would have taken weeks to instead take a couple of days. It has, objectively, increased the velocity of the engineering component of greenfield features by 40% in my org. You can put a number value on that and decide if it gives you favorable ROI.
The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
It’s a lose/lose situation for…I would say anyone employed as an engineer or programmer. I’m not taking responsible for AI output, the same way I won’t try to fix auto-generated code: because you just regenerate it.
The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
> The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
Absolutely false. Refactors (in my case) can be as simple as dropping old packages for newer packages with slightly different semantics. It can be moving legacy pages from jQuery to Vue.
> You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
I've 25 years coding, trust me, I don't lose anything by not finding out on my own that the semantics of a jQuery promise changed between major versions.
> The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
You have no idea of what you're talking about. There are entire classes of K8s networking issues that would have taken me a day to debug which Claude solved in minutes just because it can run 20 diagnostics commands in two minutes and deal with technical minutae that is time-consuming but ultimately irrelevant to my business goals.
> The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
I'm pretty pessimistic on AI and don't have access to good agentic workflows, but refactors are exactly the thing where it seems to me like agents could be really strong - once I've refactored something architecturally, I might have hundreds of instances of a thing that needs to be updated in a predictable way, but is complicated enough that it's going to be faster for me to manually update hundreds of instances rather than writing a generalizable find/replace tool.
Sure they’re fine at that sort of rote find/replace job as long as it’s relatively straightforward. But it only really works if you do the hard parts yourself then tell the agent to go and do the rote part. Even then I’ve had it turn to slop more often than not as the agent has to start contorting the code into weird shapes to try and finish the job. It’ll never stop and be like “hey maybe this was a bad idea, let’s try something else”. And by the time you get to review it, you’ve spent 20 bucks on something that needs to be thrown away.
In the old world, the refactor probably won't happen in the first place, but the effort would be put elsewhere. "Increased velocity of ..
greenfield features" doesn't directly translate to additional revenue, and your number is very questionable in the first place.
Software engineers like to talk as if business and finance are as easy as pushing code out and refactoring. It's not and never has been.
I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.
I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.
Obviously just a personal take though. I’m glad you get the usage you want out of it.
My "job" is building open source software for data journalism (and anyone else who needs the tools data journalists need, which is pretty much everyone else). I can build more of those tools, and better, in exchange for a fraction of the cost it would take to hire a team to help.
I reached my own productivity limit on several projects (in my case, I'm building a fully automated microscope that uses realtime computer vision to solve a number of longstanding problems with microscopes). As much as I'd want to write the code for it, I hit a wall when it came to debugging some particularly tricky issues- either I couldn't do it, or the time investment was too high.
I use Gemini/ChatGPT/Claude to do that work and it unblocked the enjoyable parts of the project while taking care of the tedium.
I also find LLMs help me learn faster because they can often take a paper and turn it into working code, which I find to be a very slow process.
I agree on the basic point, but running $1500/mo's worth of SOTA local AI is non-trivial already, and that's a figure for a single seat. That's equivalent to generating at least 20 tok/s on a 24/7 basis, in fact probably quite a bit more than that (because open-weight models are vastly cheaper than proprietary ones even when served from reputable Western providers - reaching the same spend would take around 100 tok/s or more, which is well within datacenter hardware territory).
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
I think companies will eventually just buy a local AI server.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?
“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
> I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?
For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
Yep, its already quite easy to do so with tools like opencode/openrouter. Ive used some open source models and they seem … ok? Im not doing foundational math, just refactoring code, understanding existing code etc. I don’t see a future where companies blow 11% of employee compensation on a single tool; the hosted AI server + oss models will 99% win out.
How is tok/s not a bottleneck I? I assume most people still use ai agents interactively rather than leaving them to do their own thing during the night.
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
I startup 4 or so projects then go do other things for 4 hours. I don’t have enough energy to steer overnight, but I’m at least “semi afk” for daytime steering. So throughput is king for me, tokens per hour. Not latency or actual tokens per second.
Running locally is even worse for this, because if you're running 4 jobs at once they just run at 1/4 speed. Not literally, you can make up some of the difference with batching, but you have limited resources instead of spreading your requests out on an API provider's nodes.
I would expect the overwhelming majority of output tokens would not be the actual code but used for analysis, reasoning, testing and iteration. If you only use the agent for autocomplete then yes, the calculation is probably different.
yea, and understanding that too is important. the idea you dont need to read code or analysis seems to align with the depwndcy addiction being shoved in thw pipe.
Is interactive use for coding something that actually works today? With unsafe mode, even frontier hosted models are slow enough I end up just tabbing out to work on other tasks. It would need to be much faster if I am to sit and stare at it while it churns. Local models might be a lot slower but workflow-wise it doesn't change much for me.
I think probably the correct spend is something closer to 10x that if people can figure agent coordination problems out. It's not even really about capability at this point, it's about keeping track of what agents are doing.
Even if companies decided to move away from expensive models from the major labs, it probably much more economical to pay a cloud provider to host some open weights model which could then be amortized across all (internal) users and do inference at a substantial batch size, rather than giving everyone their own hardware -- which means the company would need to provision for peak usage and inference at batch size of one.
You’re way better to run your own on premise models. Laptops are depreciating assets, do not benefit from economy of scale, have fixed specs, result in a fragmented fleet where you need to keep models up to date. Without talking about power consumption and cooling issues. I really don’t see why companies would go that direction
Even if the laptop costs $5k and you upgrade it every year with the latest hardware and run local models (assuming your workload can tolerate smaller models at slower tok/s), you win.
You don't need to run on laptops, desktops plugged into mains power get more power consumption and better cooling. I want my laptop to work, but I can accept when I'm on an airplane at 32k feet I get less abilities.
128GB machines can't run anything locally that is even nearly as capable as a frontier model like Claude. We can get an idea from deepseek v4 pro being 1.6T model, requiring approx. 860GB VRAM to run.
I don't think it's necessarily what Uber build, but the gained productivity. If the engineers use the AI tools the correct way, it can drastically increase the productivity and that means they can actually use the LLM as a junior or an associate engineer. $1500/mo is way cheaper for that level of productivity where as they would have had to pay far more for a human engineer.
>WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
Uber (and quite a few bay area companies and startups) can afford to spend that money. There is no expectation of profit, Uber lost ~62B and growing: https://uberlosses.com/
As much as I love to hate on Uber, that website is from 2022. Uber has been profitable since 2023.
It's profit margin seems to have stabilized around 10%.
The real economic crime is losing at least $40bn over 10 years scaling a business that ended up having retail profit margins (i.e. low profit margins).
> How did it meaningfully impact their revenue in a positive direction?
It probably allowed them to avoid hiring as many people to build a certain amount of software. Even if it didn't increase revenue, it could have lowered human labor costs.
> 128 GB machines that can run local LLMs are a bargain even if priced $5-8k.
Don't forget the energy costs. Searching around, advanced models use an average of 25 Wh/1000Tok.
$1500/month gets you about 150M tokens.
At the aforementioned energy/token, that's 3750kWh.
What are your local office electricity rates/tariffs? (Hint: they are going up because of AI data centers). Even if my price and energy assumptions are wrong above, you probably aren't going to get the rates that the hyperscalers do.
Even at cheap (i.e Texas) retail electricity rates, that many tokens will probably cost you hundreds per month. In most other electricity markets, probably far more.
I use the $100/mo sub but my 30 day API cost is about $1700/mo.
It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.
If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.
If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.
I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.
Plenty of comparisons here between salaries and token costs. All fair but very much assumes that salaries are rational. Why do we pay some engineers 10x as much for the same role just because they are in a different location? The WFH discussion surfaced some of that. If money is cheap, all sorts of funny things are happening. Is it worth to spend 1500 USD on AI? I don’t know. Is it worth paying engineers 300k USD instead of 30k? Honestly, I don’t know
> All fair but very much assumes that salaries are rational. Why do we pay some engineers 10x as much for the same role just because they are in a different location?
Who's this "we" you're talking about? Are you a software engineer or a temporarily embarrassed billionaire? Do you think the rational thing is to pay the lowest regional salary worldwide?
This kind of race-to-the-bottom logic needs to be rejected: by workers, business culture, and the government.
Unfortunately business culture embraces races to the bottom (for everyone but owners and executives), and uses its lobbying might to push the government into tolerating or even supporting it. And there are a lot of deluded workers who (for some reason) seem to be feel smart when they parrot the ideas of people who want to screw them.
As well as rational vs irrational they are also just different types of spending.
Hiring someone vs paying a vendor for a service:
- different level of commitment
- might tie your org to a physical location
- different legal risks
- shows investors a different picture (probably this would even influence a bank loan)
- manager has to fight a different bureaucracy
Not to mention that comparing the cost of a hire by looking at their salary is pretty dumb. ISTR hearing at Google that the overall estimated cost of employing a SWE is like 4X their compensation? Can't remember the exact figures though.
Just to put this in context. If every company did this, all over the world, with that same limit, we are talking about something around $45B monthly in revenue for all AI companies to share.
That's a bold assumption. Increasing costs by roughly $18 000 per employee worldwide is highly unlikely. For reference even at FAANG in Europe, that would be a 7-15% cost increase for a senior developer. More like 15-30% for non FAANG and even more for non-European markets.
I don't think it's a bold assumption, but I also don't think the assumption would lead to the conclusion.
1. Why it's not a bold assumption: it's a bit shocking now. But in two years or so, many/most companies will realize this is the cost of doing business. Just like people are ok with using Outlook, or Office 365, or (in the case of Wall Street) Bloomberg terminals, people will realize that developers will need AI coding assistants.
2. Why the conclusion does not follow from the assumption: if the limit is set at $1500/developer/month, it does not mean all developers will use it. Companies will set incentives for people to not be very wasteful. It is more likely that on average developers will consume $100-200 worth of tokens per month, and there will be some outliers who will consume 10, 100, or 1000 times as much, but they'll be few.
One could hire a competent developer here in Brazil for that amount. I know because my workplace has hired competent developers for that amount. You can even call them senior developers, but you can't get "non-startup seniors" with actual experience, those expect a bit more.
I just wanted to take their number at face value. It's not like it needs more real information to make AI a bubble.
World bank says there are 3.7B employed humans. Putting the total addressable market at around 67T if all of us spend USD 1.5k on tokens every month. This lines up well with current forecasts from the major AI labs
The $1500 number is less interesting than the fact that they hit a ceiling at all. Most engineering teams I've talked to have no idea what their AI spend is per developer because it's buried in a consolidated cloud bill. Having a hard cap forces two useful conversations: what workflows actually justify API calls vs local inference, and whether the output is being measured against any real productivity metric. Without that feedback loop it's just a race to see who can burn tokens fastest.
1,5k. For two months of that spend you could buy a machine that can self-host decent models, plus a year's worth of electricity. It's not up there in terms of quality, but with a bit more effort it works pretty decently. I'm completely baffled that that's not way more common, is it really just the quality?
I'd think for most companies the pace of change is too high at the moment. Give it a few years, a bit of a plateau in the improvements in frontier models and I can't see how many of these companies don't implode under the weight of competition on inference prices.
I think we're all past the "bet-money-can-buy" stage. The most expensive models are an order of magnitude more expensive than the middle ground ones, so you need to be selective about what you run where.
And with a bit of careful routing - there isn't a lot stopping you sending the hard stuff to a cloud model and the average stuff to an on prem model.
Second here. From recent Alibaba Qwen conference: the all-in-one box (DC in a box - I think I was called Apsara, 0.6x0.6x1.5m) plug and play, 1.5TB GPU RAM, capability to run in a fully air gapped environment, any open models... All of that is roughly $300k one time.
And this box can do non LLM tasks as well.
Performance (throughput) around 20k t/s. Delivery time - around 2 months.
For any medium sized company its perhaps cheaper to just buy it once than spending 1.5k for cloud per user
I think the main thing companies should try to understand is avoiding the use of 'claude -p'.
I definitely have written a goal file, and then just ran claude in a loop over the goal in order to 'token max'... why not? I'm doing research and have some clear KPIs where research into all kinds of techniques / tuning can improve the results. I can spend my budget on a "experiment with blah blah blah to improve blah blah" or give it a list of things to try that I know will take awhile.
Its no problem hitting hundreds of $ of API spend while sitting at a computer with 3 monitors have 6 windows of useful claude code interactive sessions, while working on 2 or 3 projects and using worktrees, and it's a little weird when you hit your limit by 2 o'clock and have to wait for token budgets to reset; god forbid, I manually edit code... which I did do for the first time in months.
You can also start to generate a lot of token spend if you do something like "hey make me a stylized slide deck using internal skill / agent XYZ based on commits A through C", which as an engineer, makes presentations building much less painful.
This uber limit is not high compared to the big SV companies.
I also randomly wrote some code in a bind yesterday, while I was on the toilet, and it felt so strange. That was the first I'd written in probably 6 months.
Nope I'm a couple levels too far removed from the code at this point for that. Closest I get is during meta-management (modularizing, complexity reduction, etc) with agents
Lock-in / switching costs are increasingly concerning me. I am using Claude for a good year now and have been accumulating so much "knowledge" in there by now. If Claude became less favorable in terms of price/performance in the future, that would worry me. I've started to think about a distributed solution, where my storage is detached from the inference, but currently Claude is still the way to go for me. Wondering if anyone has similar concerns?
Unless you work in some obscure domain, chances are that any general "knowledge" Claude has "learned" is already public data somewhere.
If you don't believe me, launch Codex and immediately start working on the same project (s). You might discover that all the knowledge accumulated means almost nothing.
Claude Code definitely remembers things about you. For just one of the more obvious examples: I was recently asking it to make some suggestions on software alternatives, and part of the answer included (paraphrased) "While a hosted service may be attractive due to your small ops team size, your experience with hosting Linux container-based services puts this squarely in the realm of an option for you." My prompt mentioned nothing about this.
This isn't something that is public knowledge, in the sense that you mean it.
Just earlier today it asked me if I wanted to create a jira ticket for something I asked it about doing. My prompt mentioned nothing about jira.
If you use Claude Code, you might want to take a look at the "auto memories" files that it creates. See "/memory" for some more information.
This.^
I realized this first when moving a design spec from Claude chat to Claude Code and panicked. I literally had to build something like Notion but for agents to act as a portable memory between all cloud and local models and agents. But honestly it paid off!
If you are interested you can try it out at markbase.cloud (disclaimer and all that). I am not charging for it.
We run a "context" repository that enables us to transition pretty seamlessly from model to model (usually codex to claude and back). It has skills / plugins / connectors / tooling in relatively malleable MD files. That's what I see as the future. Rather than exporting IDE settings we'll just carry our markdown to the next best tool.
It's hedging a bet at this point, but that's why people say there's no moat. If the tools are properly used + maintained, there should be no reason we can't use a new provider even next week (maybe with a little tweaking).
that's an interesting approach and something i also considered (using git to avoid conflicts). one thing i needed was a "database" (basically a folder of markdowns) with a fixed schema so i can let the agents record their decisions in (for example when the code conflicts with product design spec). this combined with search has been a real lifesaver.
Believe it or not, after writing this comment I was doing some more reading on the task. I'm planning to reorganize our context repo after finding this paper (it argues that AI generated context files can stunt the performance of models):
Do you believe the same people were saying those things? (Were they really?) The idea that "different attitudes towards labor have been expressed by different people" doesn't feel too remarkable
Why isn't self hosting (even just renting a GPU server, not necessarily on premise) at large companies or hosting via something like together AI to run the open weight models not more common? I've tried the open weight models and the premium models like Opus and Gemini Pro, and I find that the latter are a little better, but not nearly to the degree to justify the extreme price difference, since the differences largely don't matter for what I've tried them for, and I expect that many other users likely have similar use cases.
If the premium models are just about 10% better - that could justify the price vs. self hosting a ~0.5-1T open weights model.
Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.
Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.
Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.
For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.
There’s probably plenty of money to be made in LLMs as a service - but not enough time has passed for the commodification to occur. I’m with you in that when the dust settles I don’t think any of the frontier model providers will have a moat. Just like during the dotcom boom a catchy URL and a webpage that could accept payments wasn’t a moat, either.
Why do you think it would be more common? The pooling of GPUs to serve multiple users and connecting to docs/datalakes while respecting security controls, as a start, is non-trivial. You'd end up paying a team to manage that.
I just went through a similar discussion in my $WORK (traditional finance company on NYSE with average IT expertise) and I think the thought process is as such: it's one thing to just give your stellar dev/hacker a beefy GPU server and run whatever model they can run; it's another thing to maintain such platform for company wide. You would need human resource (likely way above normal software dev paygrade) to understand and maintain such models, maintain backend, availability etc. All these extra hassle make it just easier to pay a top tier external lab + slap a reasonable spending limit on everybody.
For the same reasons companies are not building data centers for their "regular" hosting and storage needs but put things on AWS, Azure etc.
It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.
You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.
I use Claude every day. Often for multiple hours a day.
Basically doing my job not worrying how many tokens I spend (as in too many or too few). This is a pretty complex code base (database optimizer and related).
Just looked at spent for the past 30 day, didn't even come to $600. 95% of my tokens are from cache. If I were to reach even $1500 I have to let claude run unsupervised over night (and with the amount of mistakes it still makes and guidance it needs, I do not believe we are there yet.)
> A $1,500 monthly limit per tool strikes me as a rational policy response to over-spending,...
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"
Perhaps there is something to MLM vs LLM to create a FOMO effect.
Simon is very fascinated by AI and at times he can be a little too optimistic but he is generally balanced and his perspective evolves over time which can be seen in his writing.
Nerd who loves nerd things a little too much? Sure. Paid shill by Big LLM? Nah.
“I'm finding that coding agents can take me from a vague idea to a working solution, one with tests and documentation and that looks like a carefully considered project evolved over the course of many weeks... in less than an hour.
Even if the code is rock solid, there's a limit to how many projects like that I can sensibly care for - and if they're instantly abandoned, what value was there from creating them in the first place?”
Here is Simon questioning a fundamental belief held by the pro-LLM lobby. Would a paid shill question that?
Simon is, without question, an enthusiastic pro-LLM person. I disagree with what he says often, the product market fit post was a bad take. But I don’t believe he is shying away from sharing his thoughts when they’re not favorable to the industry.
That's not at all negative about LLMs, just negative about his own usage of LLMs. He's still very heavily and unrealistically (unless he has very poor coding standards and skills, which I won't rule out) praising LLMs in the sentences you've quoted.
Note that it's not surprising that he finds his own usage (described in the quote) negative, since his real job is as a blogger, not anything else.
Literally none of those articles are critizing LLMs, only use made of them by 3rd party actors outside of the providers. It really has nothing to do with LLMs themselves.
The fact that you had to dig to August 2025 to find a single article that's actually a critic of something produced by the AI labs is just further proof.
The prompt injection stuff is very critical of both the technology and the LLM providers especially when I call out that their solution is still to say "they're getting better at avoiding the attacks" when my line has consistently been that "99% is a failing grade".
Your unwavering praise of LLMs' performance which does not match anyone's reality?
OpenAI or Anthropic would be paying you, like they pay bot farms and other influencers, and they would expect marketing in return, which you provide in boatloads.
Your job is to be an influencer, I'm not sure why anyone would be surprised that this is a possibility.
The reason so many people read my writing and find it useful is that they see me as a credible source of information: in a world full of clickbait and misinformation, I have a reputation for providing an independent voice that occupies that rare middle ground between "AI will kill us all" doomerism and "AI will solve everything" hype.
Credibility is hard to earn and easy to squander. I've been blogging for 24 years now, which has helped me build credibility with a large array of people across many different interest areas.
The modern influencer business model is to grow an audience and then sell things to them, through partnerships and sponsored content. I refuse to do that, because it strikes directly at that credibility. The moment you say "I've partnered with X to tell you about product Y" you're no longer an independent voice.
Nilay Patel of the Verge (and the excellent Decoder podcast) refuses to read ads from sponsors himself, at significant financial cost to his publication. I've adopted the same policy - I will not let anyone else pay me to put words in my mouth, because it strikes directly at the credibility I value so much.
Until a few months ago the only money I made from my blog was an https://ethicalads.io banner which pulled in a few hundred dollars a month (more if I had a high traffic piece). It helped cover some of my hosting costs for my various projects.
That changed in February - https://simonwillison.net/2026/Feb/19/sponsorship/ - when I added a Troy Hunt-style sponsor banner to my site (no cookies, no JavaScript) - currently sold by an agency called Freeman & Forrest. Sponsored slots are sold on a weekly basis and get a mention in my email newsletter in addition to the blog banner.
I'm earning enough from those that I no longer feel the opportunity cost of not going and getting a proper Silicon Valley engineering job.
If I was a publication like the Verge I'd have a complete firewall between editorial and advertising. I don't have a team, but I've tried to replicate that as much as I can by having Freeman & Forrest sort out the sponsors while I stay hands off. I'll veto sponsors if I have to (no prediction markets etc) but thankfully that hasn't been necessary so far.
The Verge policy I'm currently not fulfilling is "Our policy against receiving anything of value from companies we cover includes, but is not limited to, things like gifts, meals, discounted services, or paid trips and junkets. Vox Media and The Verge pay for all travel expenses to all events, including transportation, food, and hotels." - I've occasionally accepted flights, dinners, accommodation and some pretty absurd swag (Microsoft just gave me a jacket with my name stitched onto it as part of the GitHub Stars programme, and a bunch of gadgets in a pelican case) which didn't bother me so much when the blog was a side project, but I think I need to start refusing those kind of gifts.
The day after the jacket I wrote a piece about their new models - https://simonwillison.net/2026/Jun/2/microsofts-new-models/ - which I later had to update because I missed some crucial details. Was I subconsciously influenced by the freebies? I don't think so, but the whole point of "subconsciously" is you don't know for sure.
> That means each employee's AI spending cap is ~11% of that median compensation package.
when looking at costs - numbers make sense. however decisions as an org/company/solo founder - costs help you set prices, but to reach profitability you want to model around ROI.
now the question is what's the ROI for a $36K/investment per engineer or $90M for the total org ?
I'm in a similar boat - it's hard to measure, but let's say you pay an engineer 150K. Giving them a tool that costs 15K a year is effectively a 10% increase in that expense.
If we were seeing 3X, 5X etc improvement from individual engineers, that 10% increase in expense would be a fantastic investment (even 3 engineers for the price of 1.1??!). I have a feeling they are just not seeing that much of an improvement.
A blanket cap makes no sense to me. There's a power distribution of AI use in my company and I'd imagine it's the same at a much greater scale at Uber.
I'd guess there should be a few people Uber is bascially allocating unlimited AI spending to and a large swath they're giving basically nothing.
I would assume that at least one of two things are true:
1. They're costs are so so out of control that they need to impose a blanket cap immediately. Figuring out an allocation mechanism that can be deployed company wide is time consuming and they need to staunch the bleeding immediately, despite it being obviously suboptimal.
2. The few people who should have unlimited tokens were given exactly that. No reason to introduce such nuance to a public PR move. The hard-cap limit is a great negotiating posture with token providers.
That's a lot. On my usual day I burn less than $1 on Opus. I could get beyond $10 only if I have a complex and well-defined problem, which is rare (the second part at least).
Let's just say their performance (OKR, KPI, whatever "impact" metric you want) was indistinguishable from a peer that used the AI/LLM monthly allowance in full.
It's disturbingly anti-merotocratic. You're not allowed to prove that you're more useful without AI because they just assume that AI is a 10x multiplier on everyone.
These are still at currently subsidized prices. We'll see if they think they're getting $1500/month of value when that buys significantly fewer tokens.
True but they will raise prices slowly so people will optimize their workflow so they aren't just throwing as much inference as fast as possible like the current state. Right now you should do everything you wanted to try out because it is cheap (as long as you don't become dependent ... the risk).
This is market introductory pricing that hasn't factored in cost recovery. Most of it has been run on early investment with the assumption they will recover costs in the long run. The prices are subsidized across the board and they will need to go up signficantly to recover them.
Assuming this were accurate, then presumably the AI companies would be betting that inference costs come down before the bill is due - I don't see enterprises being willing to absorb another ~10x price increase for tokens (as they've just done going from subscription prices to per-token pricing)
For claude shops this was a huge hit. But lets back this up. There are some companies that haven't even built a break-even model at this price because they are funded by investment. As soon as those investors lose patience the first dominos will fall. For those who have somewhat of a business model, will it survive a price increase? The bigger question is do the base model providers have enough runway and have a way to keep going as they need to recover costs.
Aren’t the Chinese labs quickly turning them into a commodity?
The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.
Yeah, that's not going to work if you can get e.g. 80% of value by using 10-20x or more cheaper open models. At some point it would just make sense for large companies to rent compute and deploy their version of DeepSeek or whatever (if they don't trust Chinese providers)
The fact that Anthropic models are offered at the same API pricing by not just themselves but AWS, Azure and Vertex despite Anthropic taking a major slice on licensing along with the cost an open weight 1T parameter model like K2.6 costs to run on any third-party provider, make it unlikely that API inference cost are subsidized by the labs.
Openrouter? i.e. Even excluding Deep Seek inference for very large open models is way cheaper. Maybe these providers are not very profitable but its highly unlikely that they are losing $4 for every $1 they make since selling inference is their only product...
yes, and theres no evidence that they arent (or can't) use profitable inference to subsidise those other expenses. Some companies will keep spending massively to train better models, and some other companies will not, and offer good api prices. Which will end up being used? That depends on whether the spending turns into better value models
The evidence that per-token inference _is_ subsidized is (a) competition is a bloodbath (b) these companies are raising more money than any company has raised ever (c) a maybe-profitable quarter is maybe-coming for Anthropic after maybe-signing a compute deal with SpaceX that legitimizes both companies.
The evidence that per-token inference _is not_ subsidized is... a quote or two from Dario and Sam Altman
How are people using so many tokens? I'm on the $200/month enterprise plan for Claude Code (because it's a better deal than the API pricing) and I don't come close to the limits.
If you use stuff like opusplan and /advisor so you use Sonnet for most of the work and only Opus for the really complex stuff then it's quite easy to keep costs low without affecting performance.
All new/renewing enterprise contracts with Claude Enterprise and ChatGPT Enterprise no longer offer usage-based subscriptions, but instead will charge API pricing for all tokens consumed, and as you've said, the subs are better deals than raw API pricing.
I wonder what they are doing with $1500 per month. I'm on Claude Pro $20 plan and I'm doing well. That's 3 days per week. On the other 2 days I'm using a customer's Claude Max, I don't know if it's the $100 or the $200 plan, but I'm sharing it with some of its other developers.
Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.
Yea, I’m sure the personal plans are subsidized. I have $200 Claude Max at home and straight API pricing at work and equivalent work would easily cost me 5x if not more on the API.
> Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly.
Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.
Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.
No, we know from the financials of these companies that API prices are close to being at cost and the individual developer plans are heavily subsidized (because they are roughly 10% of API cost per token[1]).
If plans were at cost and API pricing was marked up that would mean there’s a 90%+ profit margin on tokens and instead of raising money and talking about revenue, Anthropic and OpenAI would be talking about their obscene profits.
[1] the caveat is that the average plan user probably doesn’t use all of their quota, I guess maybe 30% is the average across all users.
I'm on a $100 Claude Max plan, my usage is only about 50% of the plan limits, but in the last 30 days my usage was equivalent to API token spend of $1850. If you save all your Claude Code conversations, the saved files include API costs and you can calculate this yourself.
One of my most expensive sessions cost me over $100 in token spend in a single evening. I'd just found out that the time tracking & invoicing SaaS I use is increasing their monthly pricing by 2.4x - so I assigned Claude Opus 4.8 to recreate the entire SaaS for myself, and load in 13 years of my historical data. I've only completed a full read-only implementation so far, with adding & editing of records still to come, but I do expect Claude will have fully recreated the entire SaaS for me at an API cost less than a single 1 year seat of continued subscription to their service. And since I'm actually on a Max plan, it didn't actually cost me $200 of tokens at all.
coff i would not buy the Bending Spoons IPO coff saaspocalypse
I could ramble on about where the other $1750 of usage goes, but I imagine it's similar for most heavy Claude / AI users. Interactive coding sessions, a daily personalized podcast, some automated overnight agentic "proactive" sessions, a daemon that wakes up if I send Claude an email or voicetext to check something when I'm out. I've also noticed that if Claude's tool-use goes haywire & Claude gets confused or lost, sometimes a single email reply session that would normally be just $1 of API might spiral to $12 of API while it bangs its head against trying to run a program that's in a different folder to the one it's currently in. Sometimes a simple 'pwd' would save you a lot of headache, Claude....
It's among a wave of fresh "non-insane" takes on AI in the enterprise. Maybe we can reel things in to a sustainable level before a giant bubble bursts.
It's not so simple to determine and generalize how much value AI adds. It's going to be different on a per-company basis and a per-engineer basis. It's also affected by the competitive market place and how many other companies are using AI for their engineers.
For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.
I find it really doubtful anyone has managed to quantify that in any meaningful way. Seems like mostly an arbitrary number. Also the article does claim that's its actual several times more than 18k if you are fine with using Codex, Cursor or etc. when you Claude tokens run out.
Not really. There are clearly diminishing marginal returns, so it's likely that the first $2,400/engineer/year adds >>$2,400 of value, even if 18,001st $/engineer/year adds <$1 of value.
No, that's not what it means at all even if just doing it purely in math terms. Really it is just a reasonable amount to cap at to stop the long tail of super spenders (tokenmaxxers). You could also call it "the amount of AI spend after which Uber has decided there is diminishing returns for the average engineer".
It means Uber thinks they can sustain that level of expense. Whether engineers at Uber are representative of the rest of the work force is an easily debatable question.
And $1500 a month is on the very high end of where most companies will land. When you run the numbers there isn’t a realistic path that connects the dots between likely market size and the claimed valuation of the AI companies. The math simply does not add up.
This week an S&P 20 company with previously unlimited Claude limits also set a $250/mo/person limit; though its unclear to me how widely the limits are being enforced, may be the case that its just non-software engineers. Do with this info what you will.
In my experience, this is far below the cost the average dev will incur per month so this seems very reasonable to me. And, no doubt there are exceptions for heavy users so they can get some extra token usage when they need it.
unless they changed something in the like 2 months (edit: besides implementing a cap for claude code specifically, since other tools already had caps) since ive left my job there im pretty sure 1500$ is the very max you can use after maxing out free calls, initial budget, then 2 extensions individually reviewed by your manager
higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget
It finally puts a number on productivity gain of engineers with AI. This is probably less than 10% of the cost of an average uber developer. So they don't assume much more productivity gain from AI than 10%.
(Cost of an employee is much higher than their salary, it includes things like office space, supporting structures like HR/accounting, insurance, hardware/software, and much more)
Uber is in the business of experimenting with robotaxis and automated food delivery.
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
1) This happened because they fundementally misunderstand how to use AI and how AI is priced
2) Most organizations are throwing everything in for analyses and not limiting the answer they want. You need to be specific of about what you analyze and what answers you want
3) People undervalue prompting or templated responses. I will have written. validated and sanity checked a prompt several times and run it across several models before I say its ready for use. But when it is, I know what it will give me and that the scope of its research and answer is as close to what I want as it can be. As little excess as I can. This all saves tokens
I don't think at $1,500 you're not forced to code on your own at all, in the sense of typing code. You're simply forced to not yolo-max twelve parallel agents at all times.
The big question is, will the productivity gains be absorbed by the needs? Societies don't have a need for infinite amount of luxury and laziness offered by the productivity of the machines. At some point, you would shake off things, get up from the couch and start walking again, breathing afresh.
The tool categories that pay for themselves fastest: (1) Anything that gets invoices out faster and makes it easier for clients to pay. (2) Scheduling links that eliminate email back-and-forth. Everything else is optimization. I keep notes on which freelancer tools hit each threshold at freelancerkit.surge.sh
If you estimate 10k salary per engineer that means the moment it’s cheaper for them to hire another engineer but that doesn’t mean it’s improving productivity 15% but if 15% is the moment it stopped being better than another human we can assume 7.5%?
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
Seems odd limit, especially since it highly dependant on Token provider used, with Opus this is not much and could easily be burnt in a week or less, but with something like deepseek the 1500 can literarily be an annual budget.
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
It’s not just about the model but also setting up the system to create and share compute (GPUs) which is quite complicated on its own. Ubers primary business focus isn’t infrastructure.
Electricity actually is only a small part of the data center costs. There are challenges in getting enough electricity that create problems, but the cost of the electricity really isn’t an issue.
If I were paying API rates this year, I would have already burned through $20k in tokens. Looking forward to the costs of this level of capability coming down.
Oh that's actually really economical! I wonder if they're doing a lot on locally running models or managing a shared context or knowledge-base in some clever way, maybe just encouraging employees to be efficient and mindful.
...
> each employee
...
> per AI coding tool
...
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI
What on this godforsaken earth are all you rich idiots doing???
A lot of talk about cheaper models here. Just curios, is there any non-Anthropic model that can do UI well? GPT-5.5 is laughably bad, and I'm never restarting my Anthropic subscription after their 6-month sprint of gaslighting, even if opus was really good at UI.
I think there's too much variance between what model you're using and how much you turn your brain off. If I just paste a ticket number into 4.8xHigh its going to use a lot more tokens than if I read the ticket, tell Sonnet what it needs to do, make my commit, run unit tests myself, etc.
ccusage for codex tells me the medium feature I prompted in codex, with a $200 subscription, running for 72 hours and still not delivering full result would have cost ~ $2200 at API rates.
I also misconfigured something in my agent's configuration and a simple web tool request (maybe 4 turns) through OR went to GPT-5.5 accidentally and that cost me ~$0.4.
I have no idea how any business can afford API rates without having a mindset of casually setting money on fire.
the real interesting way to address the question of token effectiveness would be internal alpha vs beta testing and measuringing marginal revenue generated by similar teams using ai and at different usage levels. right now $1500 a month is not a meaningful signal of anything beyond current executive willingness to spend. in the long run executives will cut spending where it does not support income generation.
Token costs rising because data center build costs must be paid down.. is not the whole picture. It is actually possible for token costs to fall despite the spending frenzy.
Naively you’d expect to always keep paying more - but growth in token usage is what changes the equation. Amortizing debt over an exponentially growing amount of spend across a growing customer base (not per customer) lets the debt be paid off & costs covered even as each individual’s spend stays steady or even goes down - but it only works if there’s growth beyond some threshold that makes the whole thing hang together. No one on the outside knows how much growth that is, and everyone chases maximum growth.
Jevons Paradox ends up being your friend as well as the friend of the inference providers as well as the friend of the inference financiers.
If it’s a strong enough effect, it has potential to cancel out all the circular financing too, and let everyone ride out the bursting of the bubble.
Why are people getting these high spending numbers? A 200 USD subscription for either Codex or Claude should give you plenty of usage. What am I missing? Are they just being dumb?
The subscriptions are not available to enterprise users. Enterprise users must pay per-token. A $200 subscription gives you roughly the equivalent of $1500 in per-token billing.
What is the point of allowing a developer to spend $18,000 a year on AI subscriptions? Can't they hire a decent developer who is capable of producing a quality solution faster?
Clearly, these decisions are all made by high-level management team.
I was recently talking to an HR person from a European company, and she goes: 'We are forcing our developers to use AI coding agents, but they are still kind of hesitant.' This person had never written a single line of code, nor did she know what software engineering is. For these people, using AI coding agents = faster delivery without breaking anything.
It costs a lot more than $18,000 to hire a decent developer, pretty much anywhere in the world. Also using a model is better than another developer in some ways, because there aren't two independent minds trying to work with each other.
I still have never hit a ceiling with my Claude Max $100 account, much less the Max $200 account. I'm not burning tokens needlessly, nor running it all day, but I do use CC almost daily. What are these devs doing that they are burning more than $1500 in tokens a month?
Maybe it's just me, but I still find that I really have to "shepherd" the AI and work with it to get the results I want. And I read every line of code added and challenge the model's logic. So that limits my token burning. Maybe these people are just "vibe-coding" without really checking the results?
I would not be surprised if they have engineers vibecoding 2-3 projects each simultaneously, nonstop, on largely un-moderated review-suggest-iterate-test feedback loops.
All the code gets summarized and fed into their manager's agent contexts, probably duplicated several times across levels and departments, with some generated back-and-forth emails pinging around the org chart, eventually generating 2-3 long-winded reports that nobody will read chock full of generated visualizations that can all get consolidated into a generated slide deck that they'll show (maybe, at some point) to a handful of humans with more money than a human brain can conceptualize to demonstrate all of the innovation they're doing.
I am increasingly convinced that many of these companies are dead trees whose only function is to burn money lest it fall into the hands of the peasantry.
I have strong conviction that companies will now choose tech stack/programming languages based on 'tokenomics'. I am vibe coding using Clojure, a language I can read but cannot write and I never hit the usage limits even when using the latest model on Claude. I have similar experience with F#, which is a bit more verbose than clojure but absolutely beats every OOP language, Python, Typescript etc.
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
no....the fact that you could buy a reasonably prices MAC or AMD395+ thats AI tool pricing; it loads a big enough model and spits out tokens just fast enough that you can read what it's doing and comprehend it instead of magic.
That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.
Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on.
I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself...
Makes me think of how my Claude.md files specifies to use the built in framework code-generators (rails). Those generators are deterministically right every time.
I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
This is tricky since it can and will ignore your md directions. When possible I try to lean on tool call hooks or skills that invoke deterministic scripts. As much as you can remove the "choice" the better though still there's a lot of randomness in how reliably it invokes skills ime.
Hooks are incredibly underused by most people and are the easiest way to establish a first line of defense against bad behavior. Things like blocking tool calls that will read .env file or execute "create or replace table".
A lot of the time if you're copying code from one place to another what you actually want to do is abstract it so you can reuse it in both places.
The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
Nah the codebase is legacy fucked and I cant be bothered to try and optimize business flows without the fear of other stuff breaking.
Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
Have you tried adding this information to claude.md so it knows?
I also think your excuse is bad. "The code is legacy fucked so I'll just legacy fuck it some more because I can't be bothered to make an effort"
This is a spicy take, unless the business is willing to face some down time, and I am hired to do exactly what you said, I’d never touch any line of code unless I absolutely have to. Different environments don’t help as much.
We tend to obsess over software quality when it’s the least important thing for a business. It’s just a means to an end.
This is what its about, we have multiple ecom shops running 24/7 and cant simply afford downtime or a change of business flow that maybe doesnt affect shop A and B but definitely affects shop C and D...
> Least important thing for a business
- Takes weeks or months to get simple features out the door, and when they're out they're buggy as hell and the bugs never get fixed. Sound familiar?
> I’d never touch any line of code unless I absolutely have to
And this is how legacy code is made. Years of everyone "never touching anything they don't have to" leads to a giant steaming pile of shit.
> unless the business is willing to face some down time
How does a simple refactor cause downtime? I do this kind of stuff all the time and pretty much never cause any downtime. In the very rare cases that prod downtime does occur it's generally not because of some simple code refactor, and we have it back up in no time by just rolling it back. Unless it's not related to the code at all, in which case it also wasn't a refactor that caused it.
Are you some kind of entitled corporate dev that barely has any influence on the codebase? If I fuck up a whole business goes down as I am the only dev there currently. We cant afford that happening. Also why would I mess with anything claude.md related? I just use the CLI tool. LLM enthusiasts always claim how smart these things are so they should figure it out on their own, you know?
I have full control of my codebase. I'm not afraid to make changes to it because I know what I'm doing.
You would edit Claude.md to say things like what tech the project is using, because that's the entire point of claude.md. It's literally the solution to the exact problem you're complaining about. Any information you want it to know, you put in there and then it knows it. And you can tell Claude to make or update the file for you.
I'm not one of the people telling you how smart LLMs are. I'm telling you how to use it efficiently, by not expecting it to know everything but rather provide the information that it needs in order to be a more useful tool.
> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
An inference only platform selling good open weight model inference without the research overhead could capture a-lot of market for lower size model uses (haiky, gemeni flash). Diffusion-transformers and clever cashing can drop inference even lower, which is improving at a high rate.
The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)
At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).
For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)
I agree with all of this.
So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?
The American big boys are hoping to create "labor as a service" rather than sell tools. You don't hire an accountant that uses Claude, you hire Claude and it just does everything, without the visibility of current agents. They'll need to make it remote and obfuscated to protect their secret sauce from distillation and reverse engineering. It'll be really expensive, and be focused on enabling rich business types and upper managers.
AI may get so commoditized for certain use cases that you will not even be able sell inference at a profit. AI might be bundled in with other services, just like cursor bundles in their own AI model for auto complete with their editor. I.e. cameras might have AI for image recognition bundled in etc.
Agreed, this is where google is really, really set up to win the market. They can combine gemini subscription with a moderately more expensive google workspace and steal MSFTs entire $50 billion enterprise productivity software market. MSFT is quickly trying to get copilot in a good enough state but without TPUs I think itll be tough for them to serve a good enough model at a price people will accept.
Prices can go down while tokens sold increases so that profit increases. The labs number one goal right now is moving past software engineers so that every white collar worker in the country finds ai assistants indispensable. Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.
> Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.
Well, they just rent their hardware, so I'm not so sure. But they'll both be public soon and we should get that breakout in their cost structures, somewhat.
id be amazed any american business will aend data to china
HuggingFace offers DeepSeek as one of its models— it's pretty simple to spin up instances under your control.
I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.
For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.
The majority of Deepseek providers on OpenRouter for v4 pro are in the US. Especially interesting is that they are in the same ballpark for pricing.
They are in the same ballpark for deepseek-v4-flash, but deepseek-v4-pro from deepseek is still around 1/2 of the alternatives.
I'm pretty sure that Deepseek said that pricing was promotional. Be curious to see if it lasts.
V3 pricing from them was right in line with what the commodity providers are charging.
They announced a few weeks back that the promotional pricing was permanent.
“Any” is a very high bar Unless laws prevent it, I don’t see why a substantial minority wouldn’t buy services from where they can get them at a similar quality and much lower price.
Any IT cost center will send to the lowest bidder. This isn’t intellectual property: it’s annoying shit that is an unwelcome cost of doing business. China might copy our tedious scripts? Will they make a product out of it? Can I buy it and fire my IT staff? Great!
Not everyone using AI is using it to code core value IP.
Together.ai provide many open weights models and as far as I’m are their servers are US based (the company certainly is)
Most sane US companies will disallow use of cloud-based Chinese AI providers, because everything including code, data, PII, etc is being sent to them.
Deepseek has some models in Bedrock. There is definitely a huge market for a "good enough" model running within the country of the company
> Deepseek has some models in Bedrock.
Just looked into it, seems like at most they have just 3.2, not 4: https://aws.amazon.com/bedrock/pricing/
Looking around their catalogue more, most of their models seem quite outdated, aside from the OpenAI and Anthropic ones (but those get more expensive). I wouldn't willingly pick Bedrock and would instead throw money at OpenRouter, that has both a bunch of providers, as well as almost any model for you to try.
Saner companies ask the same question about models from their own country too.
You can run DeepSeek as it's open weights, unlike Claude or GPT.
I wonder if I could start a US-based company with good data regulation and just serve open-weight models at a competitive price. I feel like the real barrier is just that most companies willing to adopt AI usage enough to make it worth it at this point don't want to be using inferior models.
Yes, you can. There are multiple inference providers out there. The problem is, it’s hard to beat the Chinese providers in cost. And you also have to compete with frontier model providers’ subsidized offerings.
They charge the exact same prices. So many people in these comments have no idea what they're talking about. Even if they did charge less, nobody is going to deal with the latency of sending requests to China.
edit: Actually American inference providers are cheaper for Chinese models. There's way more competition here because the Chinese aren't idiots and investing every last dollar they have into data centers for llms that don't make money..
By "cost" I think the parent means the provider's own costs, not the cost of inference to the customer. The cost of land, labor, and electricity are significantly lower in China than in the US.
Can you please link me DeepSeekV4 provider that's cheaper than their official offering? And not all tasks require low latency.
Also, there are a lot of competition in China. Like a lot. You might know better than me as well, but although the biggest AI-labs are based in USA, the adoption is weirdly global. Like as a general sense of what's going on - you can see AI-related ads literally everywhere in Tokyo, almost all the time, in every single screen in public.
Deepseek's api platform for V4 Pro is the only example of this, and Deepseek V4 Flash is cheaper (usually) than from Deepseek itself on openrouter via DeepInfra.
Deepseek shot themselves in the foot because they never intended to serve V4 Pro for .80c mm ouput, that was a promotional price that was meant to expire (and still might). They intended for v4 to cost $4.00 per million but Western inference providers drove down the price because they can operate at negative margins to try and push competition out. I can assure you they are losing a ton of money @ ~80cents.
My point is, its Western inference providers that are establishing the floor price of inference. They are willing to operate at a loss in order to put their competition out of business. Chinese providers are typically at or above the prices set by American/western providers if you go looking on the Chinese internet. You aren't going to get deals from China for inference except through this one instance with Deepseek v4 Pro which wasn't even supposed to be permanent pricing.
Cro.ai seems to be: https://crof.ai/
Of course though they are not necessarily a viable solution for companies with security requirements etc. given it is just a single person project, but they still serve as a proof it can be done.
This costs more.
Not as far as I can tell. Are we seeing different things?
For deepseek-v4-pro:
- $0.350 in, $0.003000 cache, $0.80 out https://crof.ai/pricing
- $0.435 in, $0.003625 cache, $0.87 out https://api-docs.deepseek.com/quick_start/pricing
Here's a free startup idea: operate an open-weight model service, and offer "Verified AI Integrity," which signs the input tokens, the seed for the randomness in selecting outputs, and the model ID, proving that the result of the call to AI was completely "organic" and was not interfered with.
Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.
There are plenty of US-based inference providers available, including AWS, that serve Chinese models at competitive prices (vs frontier US models). They also have lots of usage. Not necessarily for coding, but for other enterprise tasks.
It's called AWS. Bedrock is right there. Price or data policy is never the issue. The models themselves are the problem -- most large US companies are not going to touch them.
Source: directly involved in these discussions. You can downvote as much as you'd like but you can't ignore the facts.
> The models themselves are the problem -- most large US companies are not going to touch them.
Can you expand on this?
Some suits with no understanding of how LLMs work are scared that the models might hack them, or believe that they'd have to send data to China because they do not know that open models can be run on your own infra.
Have you heard of openrouter? There's 1000 of these companies already. Do something else.
Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot.
A key point here is open in terms of being able to download and use it, not open as knowing what data and instructions were fed into it when training.
A paranoid part of me thinks that these models are all inherently biased and instructed to be pro CCP, with specific gaps in their training data related to undesirable historic events and political ideas.
Sure but that goes both ways. Any dataset has a bias. My coding doesn’t need to know about Tienamen square.
Applies both ways, ask it about Israel.
The same thing applies to US models. Check out various system prompt leak repos on github. There are also prompt injections by various parallel "alignment" models that pre-process the prompt before it's sent to the main one with questionable guidance.
You'd be surprised how much of bias exists in easily extractable information. Now imagine how much of that happens during training, that you can't easily extract.
So this is largely a moot point. Yes, Chinese models will likely have some weird things injected into them. But so do the US models. Do I care? Not in the slightest. Models are my code monkeys, and if the code leaves my machine, I assume IP is leaked be it a Chinese model that clearly tells me they do use the data, or US models that pinky promise they don't.
There are some objections here saying that some US firms are using Chinese AI providers, but I wonder if any of those are subject to compliance. Large firms that are disproportionately responsible for AI spending are all subject to compliance.
Do you trust OpenAI with your code, data, PII? What makes you so sure it's not all part of the next training set anyway?
One aspect Paul Kedrosky mentioned recently is the concept of „duration mismatch“. The price per token goes down over time (either because the AI vendor reduces due to competition pressure, or because customers are now incentivized to use older cheaper models). But datacenters are financed through debt, with the assumption their revenue increases over time. Quoting him: „[AI vendors are] paying for a fixed cost with a depreciating commodity“[0].
So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.
0: https://youtu.be/wGZboZcSGDY?is=64GuKyqBh_4aSjTE
do GPU chips really depreciate physically? There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally.
I think its only accounting depreciation.
I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?
GPU do depreciate indeed, but here the depreciating commodity is the token, not the hardware. You sell cheaper token with the same hardware
When everything is said and done it'll be datacenters in American competing with ones in China that have several times lower electricity prices. Token prices will drop to a level that will be unprofitable for American data centers and they will need to close.
Thats the main issue here.
Your laptop doesn't have a 100% duty cycle. If you ran it like a data center it would indeed wear out much faster.
Today's data center GPUs are essentially overclocked, and so at limit of how much the chip materials can physically handle, and therefore degrade over time. For example, GH200s operate at 1W/superchip but the actual safe power is somewhere around 650W which will allow them to function for a decade or more. But that leads to around 15% slowdown and that is unacceptable in today's competition. So current GPUs are destined to be depreciating assets.
In future, we might have fixed cost GPUs but not today.
I would presume the reason they are overclocked is because they are trying to make up for the shortage. In time, the shortage of computing components will be remedied, and tokens produced at lower power pulls will be cheaper.
i think its reasonable to give up 15% of speed for a decade more lifetime. This depreciation change alters economics of GPU
That extra decade might provide almost no revenue. The long tail isn’t profitable
There are data centers that use and rent out 10 year old server GPUs.
They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.
They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.
Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.
The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.
As long as the demand for GPUs keeps increasing, there are more data centers being built to house them.
When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.
If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?
We aren't talking about insect identification models from 2019.
What do you think are running on the T4 GPUs in AWS? A lot of the use cases I know of for them are mid-level computer vision models that don't need to be frontier level.
I can no longer edit this, but want to expand on my comment.
I've seen those vision researchers want to train on H100s at the time and being told know, wait for the T4s.
I've seen T4s running BERT models for document classification.
When there are enough Blackwells in data centers that H100s are useless for inference by your standards (I don't know if we've arrived there or not yet), there will be people who, say, want to run the Taco Bell ordering chatbot on them. There will be people who have applications that are just fine with Qwen 2.5 who will be happy renting them.
There seems to be this crazy consensus that hyperscalers are going to go into their datacenters and throw away their old GPUs. The reality is they have a ton of paying customers for them.
And there may be insect identification apps from 2019 that say "you know what? H100s have gotten cheap enough I can use a VLLM so the user can describe where they saw the insect too", or the McDonald's website support chatbot developers say "Hey, the bigger cheapers have gotten cheap enough we can upgrade our models to Qwen 2.5".
The frontier level GPUs in e.g. AWS have a huge premium. When the newer generations come out, they will be able to cut prices to a bit of a premium over the operational costs and still make a profit, and there are a ton of down-market customers who will be interested, who aren't willing to try to outbid Anthropic for Blackwells.
except for you know the enterprise customers who won't change their code and will pay to run old inefficent hardware just to keep from dealing with upgrades?
They can just ask Claude to upgrade it for them, completing the circle!
I'd agree. but also that's too scary. and the bottleneck is the massive manual change control process since there's no automation around any of this. :)
Why take risk when you can spend money and take no risk
Yes, even if the hardware is untouched. As technology advances, the power cost per compute cycle goes down. A gpu using old tech costs progressively more to operate compared to the newer models. So its value goes down over time = depreciation.
As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.
In addition to the physical depreciations other comments mentioned I'd also mention that old chips will settle into a low price and then actually go up on a per unit basis if you're trying to buy a significant amount of them. With a limitation on fabrication facilities continuing to pump out older cards is an opportunity cost to the manufacturers that would prefer to be producing newer cards. If you were in a place where you suddenly wanted to buy 10,000 3080s, as an example, I'm not certain if the market could actually fulfill that demand and no one with the ability to increase the available supply to meet that demand actually wants to do so.
Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.
> There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally
I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!
Currently it's a pretty big ask to look at the several hundred billion transistors and the interconnects between them to find what broke.
Though, those capabilities are maybe just a few years out, funnily it's taking AI to make it potentially doable.
I used to work in datacenters, during spinning disk era we had technicians from vendors basically every couple of days to replace some broken part. When the massive switch to ssd happened instead of having them every couple of days it was 3 or 4 times per month.
Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.
My understanding is that a lot of AI data centers are still heavily relying on spinning HDDs, which is why seagate, western digital are selling more HDDs than ever before.
Huh, TIL. Here's the Seagate financials for Q3FY26:
https://s24.q4cdn.com/101481333/files/doc_financials/2026/q3...
"Hard Drive exabyte shipments of 199EB, up 39% YoY, with ~90% shipped to data center customers"
"Data center revenue of $2.5B, up 55% YoY, driven by strengthening cloud and enterprise demand"
And an article: https://www.seagate.com/stories/articles/the-ai-era-doesnt-r...
I assumed the issue was similar to crypto mining, where given finite amounts of space and power it makes sense to always be running the latest and most powerful GPUs instead of keeping older hardware running. There's definitely a secondary market for these GPUs as well.
the hardware itself is still useful, but random failures happen every so often, so if you're trying to run a fixed sized fleet then your fleet shrinks when you can't get spares any more
Chips do deteriorate and fail naturally at datacenter scale or in timescales of decades, though not exactly like on financial reports. Leak current increases or electro-migrations occur at junctions or whatever those words mean.
And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.
They do degrade physically, but the bigger thing is they stop being competitive quickly. Each year or so we see doubling of GPU speeds for the same amount of power.
If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.
So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.
Gradually, and especially when hot. Modern chips are pretty close to the physical limits of how small they can be made, and that means atomic/chemical effects like electromigration are accounted for and determine the lifetime. Every extra 10 degrees Celsius of temperature doubles the speed of chemical reactions.
When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in...
sounds like planned depreciation on Intel's part, they definitely do not design server grade chips for longevity since that would harm their own revenues
It was not planned depreciation, as many chips were failing even before 2 years and this impacted not only PC Builders and Gamers, but also some server infra providers too.
This was simply poor design, it took Intel ages to really figure out what went wrong and "resolve" it.
It cost them far more than it made.
They didn't replace all the chips like with the FDIV bug though. What did it cost them? Only reputation?
Not even that in the end.
Chips age and fail with age. You can check hot-carrier injection, bias-temperature instability and electromigration as they are the main aging mechanisms. All if these are a linear function of time but exponentieal of temperature. 90-100C these chips are running at are really tough, so they are likely to fail at couple of percent to 10% range in 2-3 years depending on the margins they have in the design.
The solder joints are notorious to fail at a high rate too.
If those don't go the caps and coils will eventually.
Caps also have a rapid aging with temp.
those are easy and cheap to replace
Depends, the SMD caps spread across the board the tiny ones do start to fail and go out of spec over time. they are a right pain to replace and hard to spot one that has gone out of spec to cause the chip to start crashing.
Can you not just move the epxensive part (the gpu itself) to a new carrier board in that situation? Also isn't most of the cost of the GPU itself the design of the board, not actually making one, esp if you can move the heat sinks around?
"just"
BGA Reflow rework is not rocket science, How do you think the PCBA gets assembled in the first place? Its much easier if you dont care about the boards at all and with the huge die sizes on these accelerator chips its worth it to do a board swap
Not if you account for labour.
Nothing is stopping them, it's just not worth it: Have a look at e.g. vast.ai's pricing (https://vast.ai/pricing).
The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc... Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h.
And remember: V100s hours are sometimes sold at 1/10th the price.
If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity.
It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly.
Transistors do wear out. Not going to elaborate as it is easy to ask GPT
When it was profitable to mine crypto with GPUs people used to sell these miner GPUs on the used market after about two years.
These were about half of the cost of an used GPU just used for gaming. By that pricr, I'd say a GPU kept busy has twice as high a chance of failure after two years of use.
Not great, not terrible.
"So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt."
Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.
In order to not un-build the data centers, they at least have to make more than it costs to operate them, and also not have some attractive liquidation value (the land, maybe).
I could imagine something like “inference is done at home or in China, that’s the price to beat” and it’s not worth keeping all those GPUs cool out in Nevada.
But the parent comment was that one of the bigger costs in these data centers was the interest expense on the borrowed money. A restructuring removes or heavily reduces that amount.
The fiber laid during the dotcom bubble never paid back the investors or lenders, but it's still profitably connecting customers all these years later.
It’s true once built the data center can operate right up to a financed data center value of zero. The investors will loose money but the costs of AI will go down as they do
Yup, that is the real economic benefit of bankruptcy - a reset.
Isn't something like 90% of the fiber laid during the dot com bubble still dark?
> Stocks going down didn't un-research drugs
Drugs cost pennies to manufacture after they are researched and make their way through the approval pipeline. There are many generic drug manufacturers who can work off the existing formulas.
The more apt comparison is that LLMs won't be un-trained. Opus 4.8 now exists. Even if Anthropic somehow went bankrupt, that particular asset could, at the very least, be sold for proverbial pennies on the dollar to a "generic" inference provider.
Or locked away in litigation for decades… See what became of the Amiga
Research does get lost over time. The whole point of the patent system is keeping that from happening; if the drug company goes bankrupt, even if they lose all their internal documentation in the process, hopefully the patents and other public paperwork provides enough information for an unrelated company -- either having acquired the patent rights, or after the patent period ends -- to reconstruct the processes with less investment then the original research.
If a bankrupt AI company maintains enough of a skeleton crew to consolidate and archive its intellectual property it could be sold off to another company, but there are also timelines where it all ends up digital dust in the wind.
> If a bankrupt AI company maintains enough of a skeleton crew to consolidate and archive its intellectual property it could be sold off to another company, but there are also timelines where it all ends up digital dust in the wind.
Only if that skeleton crew had deep deep pockets. If Anthropic closed their doors tomorrow because the market collectively saw that AI was not profitable and so open sourced everything, there wouldn't be any money to train Opus 5.0... it would then have to fall on governments to put money into the hat (which I can't see happening unless it was Europe)
Datacentres aren't the same as infrastructure or research though. All the hardware in them has a finite, useful lifespan. In 10 years time it'll be totally useless
Hardware fails, and also scales out in terms of efficacy to run it as more power efficient, modern hardware turns up. It requires constant investment to keep it useful, and cost efficient
When AI pops, we'll temporarily have some extra compute capacity that will be horrendously uneconomical to run due to the high grid load and low consumer demand, before they get shutdown. There's simply no real use for them at this scale
Those data centers are specifically for AI workloads. Let’s say everything crashes and we now have all the data centers, what do you do with them? GPU are pretty specialized hardware, without AI a data center full of outdated graphics cards isn’t really too valuable.
It’s really not obvious the infrastructure we are building for AI stuff is something that will benefit humanity over time.
Without talking about the fact that bubbles are extremely destructive. Bezos is obviously someone who came out ok from the dotcom bubble but we are talking about something that destroys a lot of value globally. That has real, direct consequences, not just investors losing some money. The US economy is currently only growing because of the AI bet
You sell the GPU's to remote gaming companies.
Replace servers with regular compute.
Nvidia would have to ship game ready drivers for H100s but it could work.
They don't have display-out. You'd have to send back the screen data over pcie to the motherboard for display.
Not exactly a problem for cloud gaming.
Has there ever been a market for cloud gaming apart from middle class people with macbooks who casually want to play one particular game but not enough to pay for a whole PC or console?
I have a big beefy gaming PC. I still use cloud gaming from time to time. It means I don't need to juggle so many 100GB installs on my gaming handheld or cheap personal laptop, both of which can sometimes struggle to play actually demanding games. Battery life on those mobile computers are significantly better when cloud streaming a game instead of running computationally demanding games locally. It also makes the friction around trying out a game significantly lower, all I need to do is click play and the game is running instead of having to wait for it to download, play it a bit, decide I don't really like the game, and then uninstall it.
The feature being bundled in with GamePass makes it worth it. I used to VPN home and try and run games remotely, but it was honestly a bit of a pain. Just pressing a button and having the game launch is quite nice.
Not gonna run game on fucking tensor cores alone
Just do software rasterization and ray tracing and play Cyberpunk 2077 on medium at 720p/30fps, what's the problem?
AI GPUs have terrible graphical capabilities, if at all. They can run shaders, but they are lacking in texture units, rasterization, etc... huge bottleneck here.
These AI "GPUs" are worse for gaming than even the crappiest actual GPUs (with a G as in Graphics). Also, the display drivers won't support them, not officially at least.
The G in AI GPU stands for "grift"
I imagine that the big incentive for remote gaming would be massive price increases in gaming hardware driven by the AI industry...
If the AI industry collapses, it would seem like the price of DDR etc. would dramatically decrease and lower demand for remote gaming
> Those data centers are specifically for AI workloads. Let’s say everything crashes and we now have all the data centers, what do you do with them?
You just run the models and sell the tokens. The demand will still be there even if there will be less money in chasing new frontier model
> GPU are pretty specialized hardware, without AI a data center full of outdated graphics cards isn’t really too valuable.
AI accelerators used in DC are not really "graphic cards" any more, you ain't running gaming on it
> AI accelerators used in DC are not really "graphic cards" any more, you ain't running gaming on it
I think the lighter 40 series cards like L40 still have OK graphics features. But otherwise yeah, after the Ampere generation graphics features went down the drain. The A100 and A40 cards can do graphics well but it already makes no sense in terms of power-to-performance ratio.
You still have to pay for power and water. Those are not insignificant costs.
AI data centers are being already used at max capacity, aren't they? I have a hard time imagining people would suddenly use AI less than they do as of today, let alone collectively drop it altogether. So the worst case scenario is that they'd need to be auctioned off way under what they'd be worth now, but still for someone to use them for AI.
Inference is much cheaper than training a new model, so running them just for inference is a completely different thing than having to price in the fact that at the moment all of these companies need to compromise between compute for inference and compute for training new models. If no new models were to be trained, and all the compute was inference only, that would change everything when it comes to the overall compute cost of AI.
Dotcom infra buildup is a bad comparison, in that it wasn't even close to being all utilized. The infra was completely overproportional to the day to day usage.
I would day that the dotcom was directionally correct but the timing was wrong. For instance you had pets.com in 1999 but in 2020 you had chewy.com. It's like you had broadcast.com in 2000 but by 2020 you had YouTube that was making more in ad revenue than the next 4 largest competitors.
With investing timing matters a lot.
AI data centers that exist and are operational are running at maximum capacity. That's why you see things like the tiny little data center run by xai showing up as a valuable resource to xai (on the sale side) and anthropic (buy side). It is "only" 300 megawatts and there's a 1.25 billion rent on it per month.
If all these other data centers were anywhere near coming on line, that 300mw data center would be a rounding error not a line item as it is right now.
So someone's signed contracts for way more and way larger data centers, someone's purchased billions in hardware for these not yet operational data centers. I'm wondering how depreciation's going to work on all these assets...
Anyhow, I'm not really sure what "max capacity" is here, nor am I really aware when they're going to be delivering the operational assets that are currently levered to their eyeballs and consuming 1/3rd of the memory made on the planet.
As far as inference vs training, have new gotten radically better than old models or only marginally (at the cost of 10x or more the training costs)?
Very exciting stuff.
I imagine the trend for AI usage will go up over the very long term (5-10yrs etc.), but short term how much usage is being propped up by employer's forcing their employees to use it? Or by user's being curious about the novelty but ultimately abandoning it if it doesn't do what they want? It'll be interesting to see what changes as tokenmaxxing disappears.
> Jeff Bezos made the salient point...
Big AI investor tells us that investing in AI is good. Oh, the surprise!
Does that invalidate this point? Yes. Because it makes no sense. The big money is not going to R&D but to build infrastructure that will be outdated in 5 years.
Current AI datacenter/model development investment rate is roughly 1T/year. That's a lot. But the US economy is 33T/year. So the investment pays back (roughly) over ten years if, each year, the AI investments increase overall productivity by 0.6%, assuming the AI companies can capture half of the value of that productivity gain.
> „[AI vendors are] paying for a fixed cost with a depreciating commodity“
That's just a confusing way to say you don't think future models will be worth the development costs. Because if future models are significantly better, why would the price of tokens to access those models deprecate?
The $1T number seems more promises than reality, which is closer to the $300B to $500B level. Still a big number, but between a third and a half of the value used in the popular media.
These are similar numbers to the dotcom bubble. With GDP growth and the percentage of productivity AI contributes staying the same in this scenario this requires regular gains in revenue or growth. If things just stumble, like with most datacenters going unbuilt the bubble will pop.
I'm surprised people think LLMs, a thing which mainly excels at advertising, spam and writing code is going to generate that much economic activity.
Companies whose main core competency is writing code were already making up a big chunk of the economy before AI. Also, less wealthy companies were constrained in their use of software by the inability to afford the salaries of talented programmers (and ripoff practices from software consulting companies who in theory could help). Lowering the cost of building software systems ought to unblock a good amount of economic activity as the technology diffuses.
Those companies are certainly writing more code. But It isn’t clear that they are increasing their economic productivity. It could even conceivably have the opposite effect by fueling a race to the bottom.
e.g. an interesting possible canary in this coal mine is that there’s been a 200% increase in the rate of new apps appearing on Apple’s App Store, but it has not been accompanied by a 200% increase in the rate at which people are buying apps.
That’s great for consumers.
If the quality of all apps remains high, but if there is an increase of low quality apps it may not necessarily be great for consumers as it becomes difficult to distinguish which are the good and bad quality apps, making it risky to purchase apps.
A lower signal/noise ratio is never better for consumers.
Not necessarily. European grocery shoppers report higher satisfaction with the shopping experience than American grocery shoppers do.
The AI pundits often seem to apply the logic that code output is directly proportional to revenue and/or profit, and as such it follows that an AI usage increase leads to more code which leads to more revenue.
I don't believe this aligns with the reality of any major company, unless your business is in the literal sense "selling code" your revenue and profit is tangential to the quantity of code you produce. Google is a good example of this: most of their revenue and profit comes from their ad network, which is disconnected from their development productivity and instead heavily reliant on network effects and time in market. If I was a new competitor with infinite AI funds to throw at whatever problem I choose, I can't simply capture their market by developing an exact copy of Google's ad platform. In the same way, Google can't substantially grow their ad network by coding "more" or "better", they still need more customers and consumers to interact with their network to see any increase in revenue.
So it doesn't directly follow that a productivity increase will inherently follow an AI usage increase.
I would go as far as to say writing more Code has almost no impact on their economic productivity. What drives those companies is infrastructure and networks
So far the place where I've seen "more code being written" having a postive effect, has been in paying down tech debt and reduction of overhead. We've rewritten services (bringing multiple microservices back under moduliths) and cut costs. But I'm talking about net-negative code. That's not the point you're making. I agree that puking out 20 new features likely wouldn't gain us more revenue.
If we talking about Meta, Google, etc. code is only incidental to them earning money.
I am yet to see that ‘companies with great ideas which simply cannot afford those very expensive developers’. For the most, issue is not programmer costs. Mostly it’s inability to formulate the MVP which makes sense.
‘uber for my industry’ is not a sensible business strategy
Honestly, if you know guys whose bottleneck is pure software dev — please let me know, I have a good, experienced team in Eastern Europe, we can do wonders in product development. But coming up with sensible business ideas and executing on them in the real world is crazy hard and extremely rare.
You are wrong, sir. Their core competency is building out infrastructure and networks to support their software and user base. software is by far the least complicated thing they do.
what makes YouTube YouTube is not the video player it’s the servers that can handle petabytes of uploads a day and billions of views. YouTube software wise, is no different from the 100s of porn websites that are coded by small European teams
But what if it kills current ad-tech as we know it (paying to show ads on random sites without any way to verify that the site is legit), and the flow of ad money for legitimate goods turns back to journalism, magazines and other publications?
That would be half a trillion[1] redirected to regular people just from Google Ads.
[1] snatched my number from here: https://pixis.ai/blog/2025-google-advertising-benchmarks-for...
The other day I watched a YouTube video on a work machine with no history and got 2 AI generated video ads for scam products before the video played.
An AI generated man talking about his product building journey to make a pressure washer hose that didn't need power (in the AI video it didn't even have a water supply connected!) that was going to be banned in a week because it was too powerful so buy now.
I've seen AI slop before and scam ads before but the combination of the two gave me some real tingly spider-sense that things are going to get worse and that some unethical people will make a lot of money from it so be in no hurry to stop it.
Two of the things you’ve listed are some of the most profitable activities in our economy.
I mean, that says a lot about the kind of crisis out current economy is in. How much longer can the United States Be a world leader when it’s primary function is social media and advertising
A few things, I think you’re missing the point here
- most tasks do not require the latest frontier models, even if they are a magnitude more intelligent (we don’t actually know if that will be the case). Current Gemini flash is cheap, fast, and pretty capable with good guidance for most tasks
- now that companies pay API costs instead of a subscription they will be setting restrictions on token use to not have their budget explode (like Uber in this submission), that’s a strong incentive to NOT use expensive models, and limit their thinking budget
- there is competitive pressure from China and others who can offer very decent performances at a fraction of the token price
- the price of tokens for the frontier models is likely to go up, but the price to access older models is what depreciates! The overall price per token is going down now that we are in a new world where companies understand that token maxing is one of the stupidest concept ever created by humankind.
The cost of power cost increase alone on industry gonna erase all gains from it.
You can't consider it in vacuum. AI takes limited resources. So far it winded up cost on near every consumer electronics that runs an OS, and it winded up cost of energy that is used by the entire industry and every single customer
It's not just the cost of datacenters, it's cost of infrastructure (that given current direction of US govt will just be paid from people's fucking taxes and bills..) and cost of other industries turning outright unprofitable "thanks" to demands of AI
Using a shittier model is just more work for the user, I’m not sure why anyone does it, unless they’re playing with it like a toy.
Local privacy respecting inference can be worth it. I use a local model to log everything I do all week to automate my timesheet. I also have it do a bunch of other data tasks. I won't say that larger SOTA models wouldn't do these tasks better than a local model but PII is a concern and my employer wouldn't approve of me just setting tokens on fire everyday to do what I could do myself.
> I use a local model to log everything I do all week to automate my timesheet.
Isn’t that just more work than logging it yourself?
Not at all! My company has 100s of clients and we track time in 6 minute increments. I feed in my browser history, terminal logs, session scripts, calendar, git commits, etc etc into it and voila it produces a highly accurate timesheet in no time flat.
Automating it has been way better for me than the alternative of breaking my flow whenever I'm switching tasks to chart my time, or logging all my hours for the week in one sitting. Different strokes for different folks I suppose.
I sometimes let Claude Opus create plans, DeepSeek v4 pro implements and writes tests. Claude reviews and corrects.
Saves like $2-3 per session. Same quality code.
> more work for the user
Model routers allow this to happen automatically without any more work by the user.
> a shittier model
A ton of tasks don't require the most expensive frontier models, etc.
> I’m not sure why anyone does it
1. Faster solutions from the LLM - also reduces employee costs of having the employee waiting on the LLM
2. Avoiding things like the half-billion dollar per month bill for a single company’s LLM use recently reported in Axios
What you call a shittier model is what was considered frontier and fantastic one generation ago…
If you have a good model router, you can route to older, cheaper models that run on older hardware, for simpler tasks. That helps labs extend the economic life of their hardware investments. They will likely fight it at first though as they see it as reducing ASP.
This is why I'm building role-model, a routing protocol and a router runtime: https://role-model.dev/
Running cheaper models on newer hardware is always going to beat running them on older hardware.
The other part of that is that while price per token may be going down, tokens per task is going up
For ~equivalent tasks/results, or because we’re expecting more or better from tasks?
The real measure should be cost per ~equivalent task result, not cost per token nor tokens per task.
For better performance of ~equivalent tasks. That's what all the harness tooling people are using does: (often) increasing output quality by significantly increasing token counts.
I really wouldn’t be surprised if we saw some of these data centers scrapped in the next few years
Raise them, more likely. NVidia says that GPU hardware prices won't decrease until at least 2030. The world is out of fab capacity.
Seriously, they’re trying to justify trillion+ IPO’s while setting piles of money on fire, prices aren’t going DOWN.
Today's frontier models will be tomorrows low-end option. I think whatever model you are using today will be less expensive to use a year or two from now.
Last year's o3 was more expensive than 5.5 is. Whatever model we are using now is probably be more expensive than next year's leading models will be.
Price per M/tokens is also a fuzzy metric when newer models reason longer, and then burn more tokens while doing so.
Isn't 5.5 a router, though? As in, some prompts get automatically sent to a cheaper model?
They aren't going down, but in the meantime they'll cover their ass by bribing their way into the S&P 500 and then use your 60 year old mother's 401k and teacher's pension to fund their risky capital expenditure.
> The world is out of fab capacity.
Can anyone expand on this point? I read an article saying that the big AI co's datacentre spend was a bunch of lies because they can't build datacentres at anywhere near the rate they want to.
> they can't build datacenters at anywhere near the rate they want to
That was because the supplies the datacentre needed were constrained - supply-constrained, not end-user demand constrained, so would be in agreement with the GP comment (and the article I read didn't imply anything about lying).
From what I understand it’s mostly TSMC and the memory providers being out of capacity over the next few years.
So it’s not even about datacenters.
Here’s a Reuters article about TSMC: https://www.reuters.com/world/asia-pacific/broadcom-flags-su...
So this is actual committed contracts with all kinds of companies such as Apple, NVidia, AMD.
Also, the whole reason they can’t build data centers faster is precisely because of this.
Meanwhile, Google...
Google also needs fabs to build their TPUs.
Don't worry, they'll just lobby to ban Chinese models instead to keep their token revenues high.
> Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.
https://www.anthropic.com/research/2028-ai-leadership
If you do the math, they don't have a choice. If China captures America's AI market it'll cause a major depression. They'll give it the BYD treatment, though it'll be a lot less effective.
They'll ban them because (unless run locally or self-hosted) they are just data capture tools for the China.
Please explain to me how that works. If I download gguf file and run inference with it, how is it collecting and sending data back to China?
This makes no sense, 99% of the people using Chinese models are using them via Western inference providers who are running them and serving them to people over openrouter or whatever. If anyone is stealing your data it would be an American or European inference provider. A model has no ability to send data anywhere.
China bad by default, right?
> unless run locally or self-hosted
You will see soon that china uses illegal uyghur children labor to train these models so we should all boycott them
If it’s open weight then anyone can run it for you. Presumably someone you trust just as much as US proprietary models.
I don't think they'll offer open models for long. Since they've actually invested in power, cheap chips, cheap memory and can subsidize tokens - they'll keep undercutting big models to capture data forever. Bonus if they remove ridiculous safeguards and China will be unstoppable.
Pretty sure they'll offer them at least so long as it takes to bring OpenAI and Anthropic into insolvency. Why wouldn't they? The Chinese models are way more nimble to train and run, bring in a ton of goodwill globally, and put immense pressure on the VC furnace that is the US AI sector.
And apparently OpenAI and Anthropic think so, too - why else would they try so hard to ban them instead of outcompeting them?
You dont think CIA and NSA are reading the data Asian and European companies and individuals send to openai and antropic?
The “you wouldn’t download a car” meme applies here
China is the worst trading partner in the world. They banned most companies from functioning in their country for decades
So, have you ever been to China and could hadely found anything familay?
- Oh, they must have been blocked from entering the Chinese market!
But none of that is true. You could see global brands everywhere here — Tesla, Unilever, KFC, Apple, and so on.
---
Or have you ever actually done cross-border trade? Or any international business collaboration? If you had, you’d definitely realize that what’s really stopping you is U.S. legislation. At least, that was the case with our former U.S. partner
Have you ever heard forced IP transfer and partnerships?
One-Drop Rule + Long-Arm Jurisdiction = Everything eventually comes under US control. That's what I see, don't need to 'hear' it from
Why even bother with 'forced IP transfer' when you can just take it?
> Once a model is open-weight, safeguards that do exist can be removed
Safeguards trained into the model (ie exist in the weights) can’t be removed.
You don't have to remove the safeguards if you can prompt your way around them.
There's a subreddit for people wanting to sex-talk to various models. It just so happens that the same prompt they use to 'jailbreak' SOTA models for sex talks also works if you want to have model write malware, or tell you how to design a highly illegal device.
Search for "heretic"+Gemma/qwen/DeepSeek for examples where exactly this has been done.
We can tell that the inferencing costs for many of these models are low enough that these models are being sold close to real costs on the basis that many of them are open weight and available from third party providers who have no incentive to subsidize them.
I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.
They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.
But, yeah, the prices will come down one way or the other.
At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.
I really doubt Deepseek is subsidised. It's roughly the same price everywhere you look. Deepseek is using the Huawei hardware (as far as I managed to understand from various articles) and hence the savings.
And Chinese electricity prices are some of the lowest
Don't know why people keep parroting this, this is incorrect. Chinese electricity prices are equal or slightly cheaper then most of North America. But significant pockets such as those around the Quebec or other hydro plants are significantly cheaper then Chinese power pricing.
Not only that, China may subsidize AI, but so does the US.
Okay interesting. I presume that China also has low cost areas too no? Their grid at least seems more stable. Datacenter construction is more likely to raise prices in the US than there.
Yeah, this argument is bullshit. You can head over to Openrouter and look at the token cost for deepseek-v4-flash and deepseek-v4-pro. They are very competitive on the open market
Add MiMo 2.5 to the list. Priced like DeepSeek, performs similarly but it also has vision capability.
> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Raise, they are going to raise the prices. We will spend more on AI infrastructure in 2026 and 2027 than the gross sales of the entire global software and services sector. Current pricing is at a major loss for current providers.
They're going to need to bring in a few trillion dollars fast to meet wall street expectations. Expect prices to rise.
If Anthropic are then they are making a big mistake, their token hungry Claude code is far too greedy
API prices of Anthropic, OpenAI, and Google are massively inflated.
https://martinalderson.com/posts/no-it-doesnt-cost-anthropic...
There's no way that all AI inference providers are colluding and/or all running at a massive loss, meaning the cheap Chinese model prices must be the real cost it takes to run frontier-class models PLUS their margin.
Look at Deepseek 4 Pro. https://openrouter.ai/deepseek/deepseek-v4-pro/providers Deepseek and Baidu are subsidising prices but they probably train on inputs. I have no model training and ZDR in OpenRouter enabled, and the first provider that shows up there is Deepinfra, significantly more expensive than Deepseek. BUT much cheaper than Sonnet 4.6 and ChatGPT GPT-5.4.
> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Are they even making money off them now ?
Why would I even pay for deepseek? I get deepseek v4 flash for free with opencode. If I somehow run out of tokens for the day, I can just then on my vpn
How many more months do we need to wait, until big companies realize that flash models work just fine if you:
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
> Don't ask LLMs for big changes
> Review everything and point them in the right direction
Sorry upper management doesn't care. That's an engineering problem that you need to solve.
They were proposing a solution.. To use flash models and use them in a way that best amplifies your work.
He was making a joke.
Indeed I was. But that's lost on people here.
I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?
Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
Yes, they are all already doing this
Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.
This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
The easy decision is to just go with the biggest SOTA model you can afford.
But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.
The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.
It's the pipeline, not the model, that gets you quality at a given token budget.
It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.
They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster
There are plenty of expenses in this order of magnitude that are not tied to direct increases in productivity. I think it may become a serious hiring impediment for companies to be really skimpy on these budgets for example.
> organizations are willing to tolerate paying $1500/month/engineer
One organization, that is a software company
> which seems to be roughly inline with "normal" consumption for most full-time engineers
My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.
Which organizations?
Uber is not representative of any trend beyond big tech and VC over funded startups.
This a thousand times. The bigger models also have a habit of overcomplicating things.
Is your argument that $1500 / mo is too much? Why would the engineering team not be more rigorous in their model selection given a constraint?
If you had a business task to complete that was only possible with ai and it cost you >$1500/month of work, how long would you have to delay the task so that it's cheaper long run to buy hardware and do local models?
$1,500/mo * 14 months = $21,000.
If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.
There's a lot of opportunity cost to waiting 14 months to build something.
I agree, outside of the AI bubble, there's a lot of wait-and-see happening in the B2B world right now, I'd say we're currently 6-8 months into that 14 months.
It also presupposes that open models will bridge that gap towards opus4.5, which was really when I drank the AI coding koolaid
Nearly no one is doing anything that is “only possible with AI”. This doesn’t seem like a relevant calculation. People spend on AI as an investment in their current productivity.
I'm legit annoyed at opus 4.8 at any setting above 4.8.
I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.
There is something about using the most advanced tooling possible. Why would you pay for IntelliJ, if Eclipse can do the same thing a bit worse?
You want to master your craft, develop "optimal" systems, understand where things are going by utilizing SOTA.
You can call it FOMO, but you get the point.
> That means each employee's AI spending cap is ~11% of that median compensation package.
Probably better to use the fully-loaded cost of the engineer, which is much higher than their compensation package. The fully-loaded cost is the total cost paid for the labor power of the engineer, and it includes big ticket items such as office space, food, equipment, insurance, payroll tax, fringe benefits, recruiting costs.
If the median compensation package is $330k/year then the median fully loaded cost is probably around $450-500k.
My usual rule of thumb for the US is north of double the received compensation but something in that range sounds reasonable with such high compensation. It's actually really interesting and underappreciated how that fully-loaded cost varies from country to country. Canada (for most salary ranges) is about half again instead of double owing to the insurance portion coming out of income tax rather than being a hidden expense so Vancouver ends up being attractive for trading 160k USD for like 120k CAD in compensation and then also lowering overhead from 100k USD down to like 60k CAD. The savings can be extremely dramatic.
Why would double be a good rule of thumb for typical US SWEs? Most of the costs aren't proportional to salary, and the ones which are aren't anywhere approaching 50%, much less double.
The costs to hire management and "support staff" like TPMs that scale with SWEs that help them meet goals is proportional to SWEs - often that is taken for the higher end fully loaded costs, depending on how you define it. Office space in downtown SF, Mountain View, or Palo Alto costs more than office space for back office workers in Nashville or Utah. Firms that hire SWEs often have fringe benefits like free food etc. and while they may apply to all workers, it tends to go along with hiring lots of SWEs.
But yeah, double is insane. When I saw prices for COBRA from Facebook, it was $3300 a month, and that was god-tier insurance - the insurance benefits were so good they had a custom list of what was covered that was probably way better than anything available on the market (e.g. you want brand name drugs? no problem. You don't want to try both ambien and trazadone before taking a sleep medication doctors actually recommend? No problem - etc.) - but for my needs it was barely better than COBRA costing way less than half. $3300/mo, or even $1200/mo for an entry level ops worker is a lot of their salary, and probably where the double comes from. At SWE compensation most of it ceases to scale.
The fully loaded costs including proportional management costs isn't relevant to the true marginal engineer, but estimates I've gotten from higher-ups definitely factor into engineering decisions about "should we spend engineering time to save money/make more money - how much will doing this thing cost the company" (opportunity costs are also relevant, but usually less grounded, since most projects don't have concrete benefits like "we will save $x/yr in infra costs")
Wait what are the sleep medications they actually recommend?
DORAs. Rather than being sedatives, they directly target receptors in your brain that make you think you should sleep. I think the oldest one came out in like 2011.
It's kind of like neuroscientists found the trigger to tell your brain "we're going to do a clean shutdown now, trigger transition to runlevel 0".
Quiviviq, Dayvigo, Belsomra. All still on-patent, so they don't have generics and are pretty expensive (like $1000/mo if your insurance doesn't cover them). A lot of doctors won't recommend them in practice because most of their patients won't yet be able to get them covered.
Genuinely thank you. I’ve had sleep issues my whole life and no one has mentioned these. Not saying I will pay that, but more info is always good.
GoodRX is always worth checking out, a ton of manufacturers will have coupons if you have insurance but they won't cover it.
Ask your doctor about them, look them up in your insurance's formulary to see what's required (e.g. if you have tried both Ambien and Trazadone and can document it), and see what they can do, before writing it off!
The expectation is Belsomra will lose its patent in 2029 and then generic makers can try to get one approved - so it's not that far off!
While the fully burdened cost of an engineer being double his salary sounds suspicious, this is indeed broadly the case. It has been (sometimes significantly) more than double in the case in every US employer where I worked and where I saw both numbers. In one case it was a hair under 3x.
My experience was not with pure software houses; we had some labs, measurement and RF equipment, but even without the hardware component the offices, insurance, admin expenses, HR, janitors, conference travel and so on would easily bump the total employee cost to double the salary. My 2c.
It’s also worth noting that’s the peak benefit. Expect most engineers to not hit those limits on the regular (if at all, since limiting this puts skills in focus again), and that limit to come down over time as the easy processes are automated and humans are re-tasked with harder problems relative to their TC.
This is not a good bellwether for the AI industry, including its adherents. Their growth assumed a level of indispensability that’s not being reflected in hard numbers and real costs, which lends credence to the notion that these IPOs being fast-tracked are meant to try and cash out before the bubble really pops in earnest. There’s no way consuming enterprises are going to pay such insane costs for such minimal uplift in the long run, and the AI companies can’t keep offering subsidized tokens via subscription plans at their current pricing.
"$330k/year" Lol. I thought I clicked on hacker news 2022.
Is it too high or too low? Honestly cannot tell
Quoting the article : > Levels.fyi lists the median yearly compensation package for Uber software engineers in the USA at $330,000.
I’ve even heard the rule “twice the salary” being used here in EU, but the tax and insurance burden may be higher. All kinds of those are based primarily on total payroll amount.
That number usually includes cost of habitat and others. It's also a stupid number as it is skewed by how much you can squeeze out of your employees. A better number would be to compare it vs revenue per capita.
Both metrics are valuable.
If one uses AI minimally and is able to out perform peers who are maxing out AI spend, one might want to use that in salary negotiations.
It is also possible that capping at $1500 will give you ~99% of the benefits. So even with gains that are much higher, a cap could be a rational decision. Also, most decisions, especially around AI aren't exactly rational, so I wouldn't read to much into this number.
Why there are so many people that still believe that AI coding is a fad? It's something that started less than two years ago and companies are already paying thousands per seat. I know one that gives you 5k per month. Which other tool went from nothing to this level of acceptance so quickly?
perhaps the personal computer? Companies were spending 3-5k (10-15k inflation adjusted) on every employee for just hardware.
everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo
No disagreement on computing 2.0, but companies spending 3-5k per employee for hardware isn't generally a monthly cost. It's a at the time of hire, and then once every 3 to 5 years after that, for a monthly amortized cost of about $50/employee.
I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.
Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.
I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.
Every employee doesn't need $1k in token spend per month, either. That kind of spend makes sense for technical workers in r+d.
Most other workers are served fine by $20-30 worth of tokens on a budget model. You don't need Opus to help support write emails.
No, but you do want Opus-tier models to do desktop and office software automation (think about people who intensely use Excel and the like). Actually those might take even more tokens that coding in a lot of cases. Why do you think Claude Cowork is successful, and why do you think Codex is leaning so hard into Computer use?
I wonder if you will see app makers begin to open APIs (MCPs) up in ways that replace computer use. Computer use via human interfaces is pretty hacky IME, and if you can use an app that exposes spreadsheets in a way that reduces token costs by 90%.
I'm optimistic that the demand for AI accessibility will drive programmatic interfaces in places where companies were previously reluctant to.
Any kind of rug-pull is a serious concern. Companies are re-orienting their entire development processes around these tools. Sure they can go back, but it will require a much larger and more expensive effort than to transition in the first place.
All companies who make this transition will be more or less at the mercy of model providers.
Two things can be true at the same time. It can be true that this is here to stay. It can also be true that companies are grossly overvalued right now and that the market is irrationally exuberant. This would mean we could both have a crash and also see AI coding be the new future.
Hardware's not generally a subscription, monthly cost though.
You update it for them every 3/4 years (if they're lucky).
It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.
There's some software that can cost $1k or more per seat/month, but it's pretty rare. Big tier ERPs usually fall in the ~$600/seat/moth range, specialty engineering stuff can hit over $1k, Bloomberg terminal, etc. I wonder if what Uber's building with that $1.5k/month/employee is actually delivering the same value that something like an ERP would to the entire org...
I think the right comparison is the invention of the microprocessor. At that time people were grappling with a lot of the same things we are today - would it automate jobs away, would it transform education and the work place, etc.
The Dotcom bubble is an interesting comparison.
The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...
I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.
The question you always have to ask is what problems does it directly solve. I personally think most of the current problems in software development and really the world at large are not time-bound problems but alignment issues, and all an LLM can really do there is be some 3rd party oracle that gives you an answer without needing other humans to agree with you.
I agree with you. I think that if we're talking about actual reliable problem solving, we have to be discussing robotic / drone systems. Software is as complex as you want to make it, and always has been.
> The question you always have to ask is what problems does it directly solve
Most directly, human labour. Labour is always a problem for capital. At a certain level of AI competence, businesses don't need to pay humans to complete the work they need doing in order to operate. I don't think anyone would dispute AI competence isn't growing steadily.
I would use these exact facts as a sign that it's maybe not what it seems. It's much too big and too fast to feel stable. It might keep at that level, increase even more, or drop down to a saner level of use / allocation.
So it might either go up, stay the same, or go down? :)
heh yeah, i'm also selling trading advice :p
I can see a corporate future where tokens are haggled over in department budgets just like any other line item. Some projects will get more of them, other projects will get less of them. "Use AI for everything" will become "use AI economically and build things that outlast our budget for it."
Neat fact, those kind of conversations are already happening at ${DAY_JOB}.
> It might keep at that level, increase even more, or drop down
Bold prediction. :)
I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.
“AI coding is a fad” is not just one big camp of similar-minded people. Different groups have to give up on their pre-existing beliefs in order to be ok with AI coding.
Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.
I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.
What's an int vs a float vs a boolean? What's a function? What's a class? What's a variable? You don't actually need to know the answer to those questions in order to vibe code. That's a lot of priors to update!
Just to go on record, as of today, I’m a big believer that a person that knows all that stuff is much more productive with AI-coding than a person who doesn’t.
I have no idea how we can get people motivated to learn these through trial-and-error when AI coding exists though. I remember the days of spending hours on stupid bugs that AI can resolve within a minute. But I recall learning heavily from those experiences. Oh well…
I honestly feel like my own learning has accelerated after using AI. Simply because now it's so easy to write the same thing in so many different languages, I can e.g. learn pros and cons of each language, which otherwise would have been I think unfathomable to me. I have now created so much stuff I wouldn't have had time to create.
I setup k3s, and tons of what would be otherwise unnecessarily complicated stuff on my laptop for my side projects with additional home servers, smart house stuff. Otherwise k8s and things like that would have been daunting to learn and in theory and without constant professional exposure, etc...
Microservices in Go, Rust, which I didn't have any previous experience with, games in C and other languages. Didn't know anything about low level memory management before. Was just mainly TypeScript person. Just constantly building random fun stuff.
The question is if you already had intuitive understanding of what those things “are”. The languages and systems have been easier to learn once you picked up a couple. Same applies here as well.
The question is, how quickly does a junior with no experience builds intuition without trial and error.
But surely, it's a matter of curiousity? If you are curious you will naturally want to look deeper to understand what is going on. If you are not curious, then you wouldn't have done very well before either.
I like the presentation I heard from a Principal, that AI tools amplify your competence. If you start out incompetent, it'll just allow you to be incompetent with greater scope and (negative) impact.
yes, but a person who doesn't know any of this stuff is infinitely more productive with ai than someone who isn't when it comes to many things.
we've got product folks vibing out prototypes (not shippable but clickable) in our main front end in a few minutes to an hour. This would previously have involved 3 people and several weeks, or a ton of figma and documents to fill in the gaps. This saves weeks to months and lets them really experience the items.
Then they hand it off to someone who knows all that stuff who is also using AI and the impl also gets done faster.
The PMs are either moving infinitely faster, or at least 30x faster and not blocked constantly by others.
basically you're not comparing people who don't know much (tech) with those who do, you're comparing them before and after access to AI.
And, you don't have to vibe code. A competent developer can make great use of AI. I think a developer that can develop the system themselves is the most accelerated user.
> You don't actually need to know the answer to those questions in order to vibe code
No, but you do need to know the answer to respond to that 3AM page about prod being down.
When I started I learnt something about coding from VBA macros to automate excel.
Often that started with the macro recorder. Then you worked out what that "recorded" code/sludge did, removed the crud you didn't need or want, improved the logic and so on. I bought books to understand it better. Now you can ask a (different) LLM "what is this? why is it used? How would I?" etc which is probably a faster learning curve than books, newsgroups and old school personal home pages with good info.
I would have been quite surprised when I first used a VBA macro in anger just how far I would go down the rabbit hole. C, asm, verilog, Linux were no part of what I originally signed up for!
Some people will specialise in the equivalent of recording macros and go no further. And this will be fine for code that gets it done but doesn't matter too much in the other dimensions (security, reliability, usefulness without the authors' support, etc.) Much like VBA utilities inside companies that were useful way back when. Other people will want what they produce to be better, even good, and they will learn about floating point [1] and all the rest, much as I did. Probably learn pretty fast too. [2]
[1] https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...
[2] Working out how to write an excel vba webserver and using it to collect and and collate summary data from various divisions into reports was seedy as hell, solved the actual business problem (given ridiculous but intractable constraints) and isn't something you can record. We all have stories from a misspent youth that we're simultaneously ashamed and yet somehow proud of.
"as most arguments don't apply to today's world" makes me want to roll my eyes so hard at you. The vast majority of problems we had with building complicated systems are all still just sitting there. People are speedrunning relearning things we've known about software engineering for decades.
The more things change, the more they stay the same.
Between AI and the stock market (which of course relates directly to AI), I’ve lost count of the number of times I’ve heard lately another variation of “this time is different.” Sometimes so close to those words that I wonder why the person speaking them doesn’t feel a bit tingly. Great big warning signs all around.
The examples I gave, and the arguments that usually support them don’t really translate into “building complicated systems”. I was talking about the arguments in support of variable naming flamewars, etc.
I’m not proponent of AI generating everything without any supervision as of now. But willing to change my mind when it gets better.
Most software engineering jobs are not cutting-edge tech, or research, or solving unsolved problems. Integrations, APIs, figma-to-react pipelines, devops and etc. is what people get hired for. All those can be done much faster in the same-or-better quality by an experienced person with the supplement of AI. It’s hard to imagine any company would go against the grain and slow things down on purpose.
So I accept that “nonsense arguments are nonsense”, but with some minor differences of opinion. Naming of things matters insofar as you care as a human to actually conceptualize the system you’re building. You can call all of this stuff minutiae, and on some level I kind of agree, except for the general vibe of _caring about the quality of the stuff you produce_. That is something that still matters whether it “works”. Like, yes you can get an LLM to gen some junk, but _is it any good_ is still something you are in charge of.
As far as “boring systems are boring”, I can tell you from experience that I work on a pretty boring system, and AI is not all that meaningful in terms of its impact, and it’s not for a lack of trying.
Can it help me create a migration and add an endpoint and such? Sure. But those aren’t the hard problems. They never were.
It’s funny that you think the idea of slowing down is such a bad one, but it is another well-established truth. Slow is smooth, and smooth is fast. This notion of break/fixing your way to prosperity by way of 10,000 ill-conceived PRs is a fool’s game.
I'm sorry, you might be right. But this simply doesn't reflect my daily reality. All I can say is, nobody in my org is creating 10,000 PRs. But everyone is using Claude Code for virtually all commits. We've been doing it since about Opus 4.5ish. So far, so good.
Generally we've modified our timelines heavily, systems are working as intended, company is still making money. There are some AI-authored commits that had mistakes that we didn't catch, but I'm sure this could've been an issue even if all were human-authored. I know first-hand multiple other companies who are doing exactly the same thing.
I agree with "slow is smooth, and smooth is fast" for mission critical systems. But super majority of systems are, indeed, not mission critical.
I have the same experience. Slow is smooth with AI is still productivity improvement.
But is it enough of an improvement to justify the cost? (Since the current raises are probably just the beginning)
Because companies are betting that this spending will allow them to reduce cost by firing people.
Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.
But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
It's not built up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.
No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.
They get all the glory, but do none of the work.
It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.
> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
You can absolutely do this. It's even right most of the time.
I believe the “them” the OP was talking about was referring to the people opening the PRs, not the LLMs.
My mistake, that is definitely a different scene.
Let's be real. Most of the time you ask an LLM "Why did you do it like this?", it responds with something along the lines of "Oops. My bad. You're right to point this out."
You even have a fair chance of getting a response like that when there isn't anything wrong and the question wasn't rhetorical - which perfectly illustrates the level of the genuine understanding LLMs operate at.
Can't remember the last time that happened.
Happened to me at least three times the past 14 days. I point out where it made a design decision that causes data loss. «Oops my mistake»
I encounter it constantly with the latest models. Claude is particularly prone to it.
> I shouldn’t have said that with confidence
> I got ahead of myself there
> I overstepped, allow me to correct that
It’s wild seeing how often it’s wrong, and I only know it’s wrong because I am an SME or actually reading the sources. Most of my coworkers are not SMEs with what they are asking and do not read the sources.
A huge part of my job now is fixing fuck ups and failures resulting from these slop jockeys who have already moved on to slop up the next task.
This has happened to me, so I put this in my global CLAUDE.md, and it seems to help (I don't remember getting the response you mentioned for awhile now):
When you criticize AI, always remember that the alternative is the average employee. Today's models are pretty good.
A lot of people think they're above average. A lot of them are wrong.
A lot of average people are producing gigantic messes. At least previous to this they were gated by their mediocrity.
and have they totally got rid of the average employees? They can blame the models for the production outages already?
> the alternative is the average employee. Today's models are pretty good.
I have never seen anywhere in the world people that hates so much the working class as people do in the USA.
In my country the average employee is competent, they do their work and create wealth for the nation.
Again, only in the USA people think that billionaires are the ones creating value. Total non-sense indoctrination.
I'm not American or ever worked in the USA. It's not a judgement of human value. It's a judgement of work output.
To adequately validate work you must be at least at the same level, so if you were right (which dunning-kruger suggests unlikely) that would mean your "terrible" average employee is given a tool that will 10x their output which they cannot even check for correctness. And correctness will be low if the average employee is bad like you say, because it means they will give badly specified tasks and even with the best of us it's garbage in, garbage out. I am sure there is no way this can backfire.
All enablers also enable mediocrity. That's not new. At least when the non-mediocre engineer has to work with someone, they can have a tireless responsive partner.
I find this varies by individual, but the AI taking care of so much boilerplate and rote work of coding, and taking the role of architect, test designer, and reviewer is a lot more productive for me. Check the code may take the same skill, but it's an order of magnitude less work.
Perhaps if you need that much boilerplate it's not going to be a well-architected codebase in the first place. Abstract it out, make a lib out of it. Easier to review & test in separation. Loose coupling, high cohesion.
I remember hearing (perhaps last year?) that the model companies have specifically tried to obfuscate the "thinking/reasoning" behind the decisions the models make so as to prevent cheaper models from training on the reasoning logs. So asking one "why did you do it like this" might be not fruitful.
Not sure if that's true or if it might be influencing what you're seeing, but it's a thought.
I think that has to do more with the thinking "train of thought" that some models show as what the model is processing before making the response. There shouldn't be a distillation risk with actually asking the model to explain why it made a decision and getting the response.
That's because of a fundamental misunderstanding of what an LLM is. The only correct answer to "Why did you do it like this?" is that the specific combination of input text and RNG state caused this particular output. There's no reasoning to be had.
* EDIT * What's with the downvoting? That's a correct description of what happened. You can't ask an LLM why it did something and expect a coherent response, because there's no thinking chain, and no stored thinking state... At best, you can get a reconstruction of how the context relates to the output (basically a summarization of the context).
So what? That doesn’t negate the value they provide.
And you can certainly tell it the flow you want (and any other constraints) in the prompt.
It's so fucking bad. I'm watching a team try to maintain a huge dashboard/control application that interfaces with a large amount of hardware using solely AI workflows.
Literally nothing works, all the timers/time counters are different across the pages, constantly commands hardware to do stupid shit, breaks during critical moments/in front of clients.
Eventually mgmt had to institute change freezes for high profile events because the team was breaking too much shit all the time.
The average C suite dipshit doesn't realize that the performance drops off a cliff once your project is more than some fraction of the context window so they will make pretty dashboards all day long but once you need to cover all the edge cases of a real system it all explodes.
AI isn't trained on the type of software style we'll need to create systems using AI, it's trained on how we used to write software. It doesn't reuse code or elegantly structure annoying, it just adds more code until the thing builds and passes some fake tests, even if half of it is functionally dead/unused.
Literally in the middle of ripping apart a vibe coded mess at work to figure out what's even worth keeping. Not fun :(
What happens if you just keep vibe coding is? Does it whack-a-mole fix one area and break another?
use ai to do that
> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
There are plenty of valid criticisms or warnings about over-reliance on AI coding, but this is not one of them. Today, I am using a semi-autonomous agentic coding system which has an `interview` functionality built in - when it spits out the PR from the input, if you have questions about the motivation or context for a particular choice, you can start up a clone of the original agent in a sandbox to question it.
Now, you might claim that those responses aren't always reliable, accurate, or consistent, and that claim has a little more weight (though, in my experience, decreasingly so) - but it is _certainly_ not the case that you cannot interview an agent about choices made. I'm literally doing it every day.
Sorry, I meant interviewing the PR author for certain choices.
> Because companies are betting that this spending will allow them to reduce cost by firing people.
I've never worked at a company that didn't have a technical backlog measured in years.
If they don't hire to get it done it means they don't think it's really important to get it done.
That is an amazing point that invalidates the backlog in my mind. Stated vs revealed preferences in the end.
>Why there are so many people that still believe that AI coding is a fad?
Because there's not a single piece of evidence that this has improved the quality of the delivered software, or for that matter even the speed of features any of these companies produce, in fact if anything the opposite.
The point of software development, the hint is in the name, is to develop software, not consume tokens. If Uber was now full of 10x engineers the stock price of Uber would be up, not down on a yearly basis. Hilariously enough the only company whose stock price is up appears to be Antrophic
How dare you mention evidence! This isn't engineering you know!
I don't believe that the quality is the best metric for these companies. I doubt that Google has top-notch code quality in every product they developed, but it does not matter if they are making billions per month. Furthermore, I honestly believe that the quality stayed the same, at least.
That's just a non sequitur. "companies are already paying thousands per seat" has zero correlation with something being a fad or not. There are much more reasonable rationales explaining why companies are acting the way they are than "because AI coding is not a fad"
Can you name a service that charged companies thousands/seat/month that turned out to be almost or completely useless? There's lots of random services sold to corporates that are not very useful (all the random benefits besides health care, life insurance, and other big-ticket items), but the per-seat charge of those is much smaller.
Google Jam Board (and other digital whiteboards) had high upfront capex and lowish opex. Probably close to the price for how often they were used before being killed off.
Same with the MS surface(?) tables (not tablets). I saw load of companies buy into the hype and then discard.
Every consultant ever, but to be fair that's not per seat.
Hey I'm a consultant. They pay me to be a regular developer but they cannot hire since they just fired thousands of people which they apparently did need, turns out.
There are so many. Can I start with Oracle databases?
Oracle DBs have powered enormous numbers of applications and economic value.
Not a service, but do you remember Scrum Masters? We had them as full time employees not so long ago. Pure fad.
I hope this is sarcasm, or "half" my job doesn't exist or something. Or you talking about full time non-dev scrum masters?
Yes
Hah. Great example actually. But far less common than AI afaict.
Oracle and some company wide Microsoft licenses.
These are clearly useful.
Companies love to waste money on that kind of service, before this website became everything about AI, every week someone would post how they saved a gazillion dollars by leaving vercel or AWS to self hosting as an example.
So you think AWS is a fad?
> Can you name a service that charged companies thousands/seat/month that turned out to be almost or completely useless?
The Concorde turned out to be fad (not "useless" - which was your reframing.) Touted as the future of travel, each seat cost about $20,000 of today's dollars, but it turned out even at those high prices people and companies were willing to pay per-passenger, supersonic trans-Atlantic air travel is not economically viable, and was discontinued.
All the NLP experts that companies bring in to make those seminars despite it has been debunked decades ago for example…
It's just silly to claim it has zero correlation.
There is a whole spectrum between "ai coding is a fad" and "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"
> "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"
That was clearly a short-term trend that would obviously get fixed. Doesn't say much about AI coding as a business model.
Because writing huge amounts of code is easy for humans too. Agents already proved that they can do it. But are agents able to maintain it? I do not know and unless I know for sure, I am not fully committing to AI generated code.
i.e. I am able to write about 1k lines of code of "acceptable" quality per week. Which means in 1 year, there will be about 5Ok LoC. I am pretty sure, that I would have to spent like 60-80% of time to maintain 1st year code and the rest to make new features in the second year so I would have to hire more people and spent time to onboard them to maintain velocity. All of that are rough estimates, probably overoptimistic and way worse in 3rd year. Good luck doing such estimates with code agents. Even worse if you already have huge amounts of legacy code.
It's cope. People desperately want to believe that AI coding is going away so that they can go back to partying like it's 2020.
So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???) or that code bases that AI contributes to will spontaneously combust, or something.
I don't think it is unreasonable to say both will happen, is it?
In the long term, tokens will fall in price. Obviously. (If "tokens" continues to be the unit)
In the short to medium term, for the IPOs to succeed, people have to start actually paying for what they are using, so the price will go up, and is going up, quite a lot. Once their value is set they will slowly fall from that point (or some point maybe halfway, depending on how much the market is willing to continue to subsidise).
I am an AI cynic, but I am now an informed cynic; I am learning agentic tools so I know where they are useful and I know my enemy.
I think the "fad" here is cloud-based, metered AI being a dominant work mode.
Nothing, so far, has suggested to me that any other outcome is likely than edge- to local-scale, on-device, on-laptop, on-prem models getting good enough to the point where people use them by default and use the cloud models only when they need the extra oomph.
I cannot believe that there is anything other than an enormous incentive for companies like Uber to find local, small model and on-premises solutions to their problems, not least while pricing is so changeable and people are getting nasty surprises.
Betting on OpenAI and Anthropic being around over the long term in the form that they are now, that feels like valley hopium. Utility monopolies essentially always derive from physical/geograpical limitations, don't they?
I mean, there's an "enormous incentive" for people to run their own data centers rather than using AWS. And yet, cloud is growing and on-premise is shrinking.
While I hope local AI continues to exist, I'm skeptical that it will take over, for the same reason running your own servers hasn't taken over. It's just hard, and involves spending huge sums of money up front.
It's also not really clear how much tokens are being subsidized. The discussion reminds me of Uber. For years people on HN claimed that Uber was going to collapse once they ran out of VC money. Then... that never happened, and everyone just moved on to discussing other things.
Infrastructure is massively complex and multi cloud is super hard to do. Switching LLMs is... a drop down.
Now, that doesn't mean running your own LLM will be easy, but this will mean it's a lot more likely that there will be at least regional LLMs, in my opinion. I.e. there will be Google, whichever (if any) is left standing of OpenAI or Anthropic, and then there will be Chinese hosted LLMs, probably Indian hosted LLMs, European hosted LLMs, plus LLMs hosted on managed services (i.e. Bedrock). For sure I see large banks on the like being able to host the best OSS or even licensed LLMs on their own cloud infrastructure accounts (i.e. at AWS, Azure, etc).
And that's on top of the LLMs running on owned server infrastructure plus actual local, on device LLMs.
You're using the future tense, but all of those things already exist. Google exists, Amazon Bedrock exists, DeepSeek's cloud product exists, etc. etc. But this isn't relevant to what the post you are replying to said, which is that "cloud-based, metered AI being a dominant work mode [is a] fad". Since all of those things are cloud-based, metered AI.
I was talking more about on-premises, on private cloud and on-device stuff, as I said.
If you look at what Uber is spending per developer per month, they clearly have some headroom to consider whether more-local, unmetered AI tools on device, on premises, in private cloud, can be cost-effectively used to cut down how much money they are pouring into Anthropic and OpenAI. Not least because a bit of centralised effort might lead them to distilled models that are better for their purposes. Some of that budget could go into simply putting a bit more capacity on a developer's desk.
Can they do it now for everything? Obviously not. But IMO there is no reason at all for planning and scaffolding tasks to be done with cloud models, and there are many reasons why it might be better to do document processing without leaving the premises.
The incentives are there on the technical, operations and particularly on the business levels, and the relative disruption of the switch really small, considering that all the tooling can use different models for different tasks already. They must at least be investigating the possibility; it's irresponsible not to.
Token costs do go down over time for sure due to software optimizations (i.e. better attention kernals) but acting like hardware INFLATION isn't happening for at least a few more years is just nonsense. Objectively an A100 is more expensive to rent today than it was in 2024 (a 7 year old GPU - Big short guy is a turbo idiot) and rising. As such, over short time horizons, it's possible to see limited amounts of "price per token goes up" for the same model.
It's a mix. If the current wave of LLM businesses crater, demand for LLM specific hardware (and related hardware) will crater. GPUs were propped up by crypto currencies and now by LLMs. They're still great at doing fundamental math operations, but for their value to stay up another massive business opportunity involving matrix multiplication and the like would need to rise as soon as the current business cycle winds down.
Not impossible, not unlikely, probably 50-50.
> So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???)
I mean, Github Copilot's pricing just went up considerably, so I guess they were right?
Why are there so many people who mistake simple anecdotes for actionable data? Why do the majority of businesses fail rather than succeed?
Because we have spent a lot of time and money using AI to generate code and have been unimpressed with the results.
As for why they got accepted so quickly 1) the industry's long running desperation to deskill computer programming 2) the addictive psychology baked into LLMs "That's an elegant solution! Shall I ... ?"
Also, a bucket for VC to put all that NFT, IoT, blockchain, VR investment into. VCs gonna VC and the last 15 years of bets failed so the last few years have been a transition away from those toward "the next thing".
Because the vibe coded stuff is sometimes great, sometimes it breaks stuff, sometimes it breaks things that we fixed multiple times earlier. The PRs are too large, nobody can review that mess and you better be on call for your deployment. Maybe it will get better, maybe not. I dont know yet.
Oh, it won't get any better. LLMs already trained on every bit of code ever published, they won't get any more material.
If anything the snake is eating it’s own tail because now it’s training on vast amounts of its new slop…dragging down the average bar of quality.
They can be reinforced with best practices and context windows etc will increase.
The massive PRs is something that probably has to end. You can ai generate smaller changes in reviewable PR sizes. It probably even helps the AI code review tools to break the work in to smaller logical chunks too.
What about that means AI coding is a fad?
Fear of loss to competitors embracing a technology creates a fear driven adoption.
Let me ask you this: is any technology worth so much break-neck adoption without first seeing clear evidence of ROI? No. The adoption is irrational.
What makes you think there is no clear evidence of ROI?
All of the articles and CFO’s saying so, and companies like Uber cutting back on AI spend.
Uber cutting back to ~$1,500/engineer/tool/month makes it look to me like they think there's at least $1,500 of monthly ROI to be had per engineer.
1500/Mo per engineer is such a small price considering the base salary of these employees, Maybe Uber knows something we don't (the 5X engineering ROI isn't there for them?).
Judging the ROI of an engineer is hard. Adding AI on top of that makes things worse, I think. I've heard AI makes engineers 3X, 5X, 10X and even 100X.
If I told my CEO that I was 4X more effective with AI, I am doubtful he would be willing to spend even 1X my salary on tokens. Even though he would be making out in the end.
At some point the ROI is pretty much vibes, man.
So touche, but since it's usage per task it's kind of weird.
This means that the average engineer is efficient at (say) identifying the first 10 tasks they should do but there are diminishing returns after that? That seems like a weird pattern. Wouldn't it be more likely that certain tasks have a ROI based on how efficient the task is generated?
Like I'm trying to imagine in my head, if you think an engineer is more efficient with the tool, why deny them more tokens. I guess so they think to use them more efficiently?
So, maybe I conclude that I think your conclusion that there must be $1500 per engineer is flawed. And even if it were true, I don't think the benefit would be evenly distributed. I suspect this is a first pass at figuring how to budget them and there will be a second pass.
While it certainly reeks of motivated reasoning, Jensen Huang assertion that an expensive engineer should be using at least their salary in tokens feels more logically sound to me (assuming the average engineer is efficient at using tokens, I have a feeling it's a normal distribution)
"I suspect this is a first pass at figuring how to budget them and there will be a second pass."
Completely agree with that.
Setting a cap motivates developers to invest their tokens wisely such as choosing the right models and not burning tokens for fun or side projects, same as any budget.. it’s not any deeper than that.
At my company we can ask for temporary cap limits if it’s justified, which is fairly common.
> Which other tool went from nothing to this level of acceptance so quickly?
NFTs? My company had nothing to do with blockchain but I ended up working on NFT integration regardless.
I still believe Scrum is a fad and yet companies have been spending obscene amounts on to push it down developers' throats for decades now.
Scrum spending is very rare IMO. No company I have worked at pays anything for scrum.
As a side note, I wonder when we'll hear the first reports about employees reselling (parts of) their token budget.
Probably not worth it risking your job for a 200$/month good, but at 5K, I'm sure some folks will be tempted. Especially if companies do stupid things like token usage leaderboards.
$1500/mo is $18,000/seat/annum.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
at their scale they could also just run a large on-premise or rented (basically still cloud, but cheaper) GPU cluster and run through that. fixed costs, even license a SOTA model’s weights if you’d like
> even license a SOTA model’s weights if you’d like
Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
That's going to stop eventually, and I think at that point we're going to see business models more like the major CAD providers.
I don't think they'll have a choice, open weights models are not far behind. At some point it's essentially a commodity game
they also already do this…
Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
I'm not sure the labs will win either. I wouldn't be surprised to see OpenAI & Anthropic just get acquired, either by Microsoft or Amazon and their models just become another product offering in their public cloud and and some hybrid on-prem offering like Azure Stack HCI or Azure Stack Hub (already basically a "cloud in a black box" that could become "AI in a box")
The problem isn't really Uber, Microsoft or Nvidia, it's all the smaller none IT companies that also have developers on staff. They are screwed. $1500 per seat per month is just way to expensive, but they also can't afford to build and maintain their own on-premise solution. If Microsoft can't afford to run CoPilot for their own developer, what chance does any of their customers stand?
If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
It's an extra 18k a year for developer tools when they're paying how much a year per developer? Having software developers at all isn't cheap.
Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.
$18k a year is a non starter in most companies. Ive seen companies balk at Intellij.
That depends on where you are. $18K is the equivalent of paying around 15% more for your developer.
In hcol locations yes, but in south of spain you can get full time talent for that figure. It's also an entry-level salary in eastern europe, with ukraine and turkey even being somewhat cheaper.
In Latvia, the net salary for a Java dev is around 1729 - 4314 EUR, based on https://www.algas.lv/algu-informacija/informacijas-tehnologi... (crowd sourced data)
For the employer those employees cost between 2945 - 7736 EUR per month based on https://kalkulatori.lv/lv/algas-kalkulators (income and social taxes).
So on the lower end that's (1500 USD ~ 1300 EUR) close to half the total expenses of such a developer, on the high end here around 15-20%. That's quite significant, depends on whether their productivity also improves (if that's what the orgs care about).
And we’re not even the country with the worst pay out there, but pay the same for tokens, cause regional pricing isn’t a thing!
There's models for every price point. What was SOTA and stupid expensive to run a year ago is a cheap flash model today.
Why are smaller non-IT companies "screwed" because they can't pay out the nose for their developers' AI usage? They're non-IT companies, developers are presumably not on their critical path, or not their bottleneck. Developers can keep on writing code the old way, or doing it with a more reasonable AI spend. I don't see how this "screws" any company.
That was badly worded on my part, my intend was to indicate that there was no way they can or will pay $1500 per month per seat.
> WTF did Uber build with all of that spend?
WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
I can say at least for me at a small-ish company (~40 FTE) there has been a surge in internal productivity tools. Nothing to improve the end user product directly but a lot of tools to make processes easier and less error prone.
What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.
About the same ~40 FTE team. We're doing the same thing. Smattering of internal tools, but no net gain in external revenue. Who knows which of those tools will have any value or ppl are just doing it because it's cool now to make fancy dashboards.
OK. I guess that's good, too.
Yeah this seems to be a pretty widespread story, from what I've heard as well. The thing about those janky dashboards and spreadsheets though is that somebody understood them and built them with intent to solve a particular problem. Despite the rickety appearance, they're trustworthy tools. A polished single page app might look nicer but it's harder to debug than an excel sheet, and much less transparent in its internal workings--especially if nobody actually wrote it...
More importantly, it's questionable how much extra revenue improving a design of internal tool brings.
Imo its pretty clear that anyone who is taking the issue at least somewhat seriously knows the amount of value they provide is not non-zero. However, the problems are manifold: firstly, toolchains vary wildly, from fancy autocomplete, to engineers chatting with codebases they're unfamiliar with, to people integrating them into devops and infra, to people doing spec driven development, with a thousand philosophies inbetween. Many people suspect that those above them in the ladder are on the cusp of massive failure due to losing track of the code, and many people higher on the ladder think those below them are overly cautious. I hate to be the guy saying "oh it must be somewhere in the middle", but I will say at the very least I like being able to use it to read docs for me, and to synthesize syntax and simple scripts (give me a join that works across these tables and gives me column x, y and z - give me a python script that parses a file like this example and extracts abc data - given this api spec figure out how I can get this data from this endpoint, go)
as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.
I agree the most interesting use cases I've heard of are about increasing the rigor of software development practices, but there's definitely a lack of coherence in methodology.. I believe that some users and companies are successful in this effort, but the odd (and interesting!) thing is that so far we don't seem to know how to communicate how to do it successfully.
The real answer?
Software engineer quality of life.
There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.
Yeah I think this is probably most accurate.
> doing a days work in an hour then fucking off in a variety of ways
Until companies start hiring 5x less engineers than they did before and well.. we are clearly moving towards that direction
Quite possibly. Doubftul it will happen all at once. If you can get 8 hours of work done in 1 they'd need to ramp up demand 8x. Would be interesting to see that happen over night. Happy monday. Here, take these 30 tickets.
But that's an inefficient use of dev salary. Y'all are gonna get ground to smooth well-compensated paste.
~70 FTE Engineering team. We are shipping more features, especially features that previously would not have survived the cut to make it on the roadmap. Even though we are shipping more, our total amount of escaped bugs has not increased, so our escape rate has actually lowered. On top of that we are able to triage and fix escaped bugs more quickly now. And then of course there has been an uptick in internal tooling that makes the rest of the company more efficient, and we have been able to address tech debt at a higher rate than before.
I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.
And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.
That's not really the important question; the important question: is it generating revenue.
If you increase your spend -> ship more features -> no correlated increase in revenue, that's just burning money.
If a team of 10 spends 1 extra headcount ($180k/year) and ships features with no corresponding growth in revenue, what does that mean?
There was probably a reason it was on the backlog (because it didn't really have value).
> is it generating revenue
Yes! :)
> There was probably a reason it was on the backlog (because it didn't really have value).
There are definitely things in the backlog with low value. We don't work those items, even if we could now. The additional bandwidth we have now goes to valuable features that drive revenue and retention metrics. The reason they were on the backlog were because we just didn't have the bandwidth to execute on them well and they were just somewhat less valuable than the critical path items on the roadmap.
> it's WTF did Uber build with all of that spend?
You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
This is what all "platform engineers" have to do once things are working nicely: you have to keep inventing work.
I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
But most Platform Engineering teams in smaller companies (and especially non-US) add a layer on top of existing technologies. A layer that usually maps to the specific culture and idiosyncrasies of that company; a bit like the deployment flow which is usually very specifically shaped on how a company is.
But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
you don't get promotion for supporting existing things, but for "inventing" you can get promoted. also for large migrations
> You can ask the same for the median 330k salary in the US for Uber Engineering
People DO.
It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
The massive misalignment in large companies is no secret. But neither is the fact that when someone comes to cut, they also have no idea of who is doing load bearing work that matters, and who doesn't. I look at recent cuts around my large corp, and it's clear they are made at levels that have no visibility of the ground, and are uninterested in said visibility. Obvious mistakes that are worse than what claude would have told you (yes, I asked Claude to pretend to make the budget cuts in our org y looking at the same data an exec could probably get. They were better than what happened)
I think it's a general problem, but in my rare conversations with execs nowadays, they seem rather uninterested in improving their decision making there. The actual performance of the organization does not appear to be all that relevant to them.
This is a very good answer but there's a flip side too.
The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
Sure, but has their rate of value added increased as a result? It's a good question to ask. They added value before LLM coding, and now are more expensive than before thanks to token costs.
$1.5kpm for SOTA. 128gb you run DSV4 Flash.
What's the point of running it locally though? Inference for open models is quite cheap already. They could just selfhost, anyway. The experience of running LLMs locally will be excruciatingly bad in comparison at least for the near future.
Right - the future of LLMs is like ol' windows XP+Dell. Commercialized "things" you run locally offline, co-designed with hardware, with a known productivity suite, and large businesses building the next generation thing and suite with 18mo release cycles (ish).
XP? I can see the argument for enterprise support but in that case the latest windows OS is going to be virtually free and I dont know if MS and Dell etc. would even support an XP machine. Might even be required for hardware. If no enterprise support wouldnt Linux make a lot more sense?
I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).
"Windows XP+Dell" should have been in quotes. It's similar to the way enterprise productivity software was developed, packaged co-designed with hardware, and sold on an 18mo upgrade cycle assumption. It's not literally windows xp.
Oh gotcha. Yeah that's an interesting idea.
I don't see it. Leasing equipment and paying per seat license fees makes a lot of accounting and cash flow sense. Maybe when it gets to the point where you can run SOTA LLMs on consumer hardware. But that seems a solid decade and probably much more away.
Even then it makes more sense to rent the bigger GPU and get your answer faster.
There's waayyyy too much money betting on that not happening, to the point I feel there'll be regulations popping up for "safety reasons" etc to ensure the big players control this.
3/4 of Microsoft's BUILD conference the past two days were about local AI, foundry local and Windows ML along with a big section in the keynote about running local workloads on their new hardware with Nvidia. Say what you want about Microsoft's reputation, but they are a "big player" and seem to be moving in the direction of local AI first.
I would love this to happen of course, just paranoid it won't.
Your last question is really important. What did they accomplish with all that spend?
I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
Never confuse movement with action.
If you believe a 128gb machine that is essentially DGX Spark in a laptop chassis can run models comparable to SOTA you either never ran open models on hard tasks, or you aren't scratching the surface of SOTA closed LLM capability in how you're using them.
Can you show me an example of a hard task that can't be achieved using light models? When we don't want the model to work on autopilot without reviewing the code at all. Even SOTA models will produce garbage code, if you don't guide them all the time.
Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.
You can't get an edge using local models, these guys may have competitors that will spend on SOTA models. They won't likely ever consider local machines even for some offloading scenarios, the complexity and costs will be even higher.
Consider rewiring your perspective: getting an edge doesn't really matter; the only thing that matters is will customers pay for this? Is this a useful, valuable problem to solve?
Coding faster doesn't really solve that.
Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.
I am wondering more and more if this becomes true as these smaller models take off. I might be old fashioned but I have yet to crack the workflows some of the hype people spout like Claude codes Boris where he and others talk about running hundreds of agents overnight.
I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
Running hundreds of agents overnight is almost certainly 99 percent waste.
That's because for some of these folks, the cost of the tokens doesn't have to match the value of the output; the hype from the story is all they need.
Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............
18k/yr? None of the LLMs generate anything like that in value!
I'm definitely getting that much value out of Claude Code and Copilot.
You're a content creator; you define your revenue stream.
Uber engineers do not define their revenue stream; the product leadership team does.
$1500/mo of AI spend by engineers does not equate to revenue. They need to figure out revenue first before zeroing in on AI spend.
$18K a year is a fraction of the salary of a junior engineer.
Claude has allowed me to do refactors that would have taken weeks to instead take a couple of days. It has, objectively, increased the velocity of the engineering component of greenfield features by 40% in my org. You can put a number value on that and decide if it gives you favorable ROI.
$18k a year is near half of my salary as junior verging on senior developer in the conservation field. Not everyone works in FAANG.
The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
It’s a lose/lose situation for…I would say anyone employed as an engineer or programmer. I’m not taking responsible for AI output, the same way I won’t try to fix auto-generated code: because you just regenerate it.
The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
> The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
Absolutely false. Refactors (in my case) can be as simple as dropping old packages for newer packages with slightly different semantics. It can be moving legacy pages from jQuery to Vue.
> You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
I've 25 years coding, trust me, I don't lose anything by not finding out on my own that the semantics of a jQuery promise changed between major versions.
> The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
You have no idea of what you're talking about. There are entire classes of K8s networking issues that would have taken me a day to debug which Claude solved in minutes just because it can run 20 diagnostics commands in two minutes and deal with technical minutae that is time-consuming but ultimately irrelevant to my business goals.
> The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
I'm pretty pessimistic on AI and don't have access to good agentic workflows, but refactors are exactly the thing where it seems to me like agents could be really strong - once I've refactored something architecturally, I might have hundreds of instances of a thing that needs to be updated in a predictable way, but is complicated enough that it's going to be faster for me to manually update hundreds of instances rather than writing a generalizable find/replace tool.
Sure they’re fine at that sort of rote find/replace job as long as it’s relatively straightforward. But it only really works if you do the hard parts yourself then tell the agent to go and do the rote part. Even then I’ve had it turn to slop more often than not as the agent has to start contorting the code into weird shapes to try and finish the job. It’ll never stop and be like “hey maybe this was a bad idea, let’s try something else”. And by the time you get to review it, you’ve spent 20 bucks on something that needs to be thrown away.
In the old world, the refactor probably won't happen in the first place, but the effort would be put elsewhere. "Increased velocity of .. greenfield features" doesn't directly translate to additional revenue, and your number is very questionable in the first place.
Software engineers like to talk as if business and finance are as easy as pushing code out and refactoring. It's not and never has been.
Can you share some examples that you would say justify that price? Not a gotcha, I’m genuinely curious where you’re seeing a return at that level.
I've written tens of thousands of lines of tested, working code that I would not have written otherwise, and that code is useful to me.
I effectively get to operate at the rate of a small team of engineers - I know that because I've managed small teams of engineers in the past.
> that I would not have written otherwise
I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.
I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.
Obviously just a personal take though. I’m glad you get the usage you want out of it.
My "job" is building open source software for data journalism (and anyone else who needs the tools data journalists need, which is pretty much everyone else). I can build more of those tools, and better, in exchange for a fraction of the cost it would take to hire a team to help.
I reached my own productivity limit on several projects (in my case, I'm building a fully automated microscope that uses realtime computer vision to solve a number of longstanding problems with microscopes). As much as I'd want to write the code for it, I hit a wall when it came to debugging some particularly tricky issues- either I couldn't do it, or the time investment was too high.
I use Gemini/ChatGPT/Claude to do that work and it unblocked the enjoyable parts of the project while taking care of the tedium.
I also find LLMs help me learn faster because they can often take a paper and turn it into working code, which I find to be a very slow process.
I agree on the basic point, but running $1500/mo's worth of SOTA local AI is non-trivial already, and that's a figure for a single seat. That's equivalent to generating at least 20 tok/s on a 24/7 basis, in fact probably quite a bit more than that (because open-weight models are vastly cheaper than proprietary ones even when served from reputable Western providers - reaching the same spend would take around 100 tok/s or more, which is well within datacenter hardware territory).
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
It’s non trivial now - will it get easier in 12 months though?
I think companies will eventually just buy a local AI server.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?
“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
> I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?
For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
That makes very little sense. SaaS/cloud tooling is overwhelmingly popular for internal tooling.
Which category of developer tool has on-premise as the more popular option?
Cloud isn’t about “reliability,” it’s about being able to focus on your core business rather than spending all your time maintaining stuff.
Local AI servers are different because they don't have to form a single system. If one AI server goes down, just use the other one.
This is unlike customer facing systems where, if your database server goes down, you probably can't just use the other one--the whole system is down.
Yep, its already quite easy to do so with tools like opencode/openrouter. Ive used some open source models and they seem … ok? Im not doing foundational math, just refactoring code, understanding existing code etc. I don’t see a future where companies blow 11% of employee compensation on a single tool; the hosted AI server + oss models will 99% win out.
How is tok/s not a bottleneck I? I assume most people still use ai agents interactively rather than leaving them to do their own thing during the night.
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
I startup 4 or so projects then go do other things for 4 hours. I don’t have enough energy to steer overnight, but I’m at least “semi afk” for daytime steering. So throughput is king for me, tokens per hour. Not latency or actual tokens per second.
Running locally is even worse for this, because if you're running 4 jobs at once they just run at 1/4 speed. Not literally, you can make up some of the difference with batching, but you have limited resources instead of spreading your requests out on an API provider's nodes.
It's not a bottleneck if you care about the actual code.
I would expect the overwhelming majority of output tokens would not be the actual code but used for analysis, reasoning, testing and iteration. If you only use the agent for autocomplete then yes, the calculation is probably different.
yea, and understanding that too is important. the idea you dont need to read code or analysis seems to align with the depwndcy addiction being shoved in thw pipe.
Is interactive use for coding something that actually works today? With unsafe mode, even frontier hosted models are slow enough I end up just tabbing out to work on other tasks. It would need to be much faster if I am to sit and stare at it while it churns. Local models might be a lot slower but workflow-wise it doesn't change much for me.
I think probably the correct spend is something closer to 10x that if people can figure agent coordination problems out. It's not even really about capability at this point, it's about keeping track of what agents are doing.
Even if companies decided to move away from expensive models from the major labs, it probably much more economical to pay a cloud provider to host some open weights model which could then be amortized across all (internal) users and do inference at a substantial batch size, rather than giving everyone their own hardware -- which means the company would need to provision for peak usage and inference at batch size of one.
You’re way better to run your own on premise models. Laptops are depreciating assets, do not benefit from economy of scale, have fixed specs, result in a fragmented fleet where you need to keep models up to date. Without talking about power consumption and cooling issues. I really don’t see why companies would go that direction
Even if the laptop costs $5k and you upgrade it every year with the latest hardware and run local models (assuming your workload can tolerate smaller models at slower tok/s), you win.
You don't need to run on laptops, desktops plugged into mains power get more power consumption and better cooling. I want my laptop to work, but I can accept when I'm on an airplane at 32k feet I get less abilities.
128GB machines can't run anything locally that is even nearly as capable as a frontier model like Claude. We can get an idea from deepseek v4 pro being 1.6T model, requiring approx. 860GB VRAM to run.
I don't think it's necessarily what Uber build, but the gained productivity. If the engineers use the AI tools the correct way, it can drastically increase the productivity and that means they can actually use the LLM as a junior or an associate engineer. $1500/mo is way cheaper for that level of productivity where as they would have had to pay far more for a human engineer.
>WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
Uber (and quite a few bay area companies and startups) can afford to spend that money. There is no expectation of profit, Uber lost ~62B and growing: https://uberlosses.com/
As much as I love to hate on Uber, that website is from 2022. Uber has been profitable since 2023.
It's profit margin seems to have stabilized around 10%.
The real economic crime is losing at least $40bn over 10 years scaling a business that ended up having retail profit margins (i.e. low profit margins).
> How did it meaningfully impact their revenue in a positive direction?
It probably allowed them to avoid hiring as many people to build a certain amount of software. Even if it didn't increase revenue, it could have lowered human labor costs.
> 128 GB machines that can run local LLMs are a bargain even if priced $5-8k.
Don't forget the energy costs. Searching around, advanced models use an average of 25 Wh/1000Tok.
$1500/month gets you about 150M tokens.
At the aforementioned energy/token, that's 3750kWh.
What are your local office electricity rates/tariffs? (Hint: they are going up because of AI data centers). Even if my price and energy assumptions are wrong above, you probably aren't going to get the rates that the hyperscalers do.
Even at cheap (i.e Texas) retail electricity rates, that many tokens will probably cost you hundreds per month. In most other electricity markets, probably far more.
How much more software does Uber need?
Unless they are iteratively replacing expensive vendors and optimizing other headcount costs?
I use the $100/mo sub but my 30 day API cost is about $1700/mo.
It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.
If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.
If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.
I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.
Plenty of comparisons here between salaries and token costs. All fair but very much assumes that salaries are rational. Why do we pay some engineers 10x as much for the same role just because they are in a different location? The WFH discussion surfaced some of that. If money is cheap, all sorts of funny things are happening. Is it worth to spend 1500 USD on AI? I don’t know. Is it worth paying engineers 300k USD instead of 30k? Honestly, I don’t know
> All fair but very much assumes that salaries are rational. Why do we pay some engineers 10x as much for the same role just because they are in a different location?
Who's this "we" you're talking about? Are you a software engineer or a temporarily embarrassed billionaire? Do you think the rational thing is to pay the lowest regional salary worldwide?
If your competitors do, you likely will
> If your competitors do, you likely will
This kind of race-to-the-bottom logic needs to be rejected: by workers, business culture, and the government.
Unfortunately business culture embraces races to the bottom (for everyone but owners and executives), and uses its lobbying might to push the government into tolerating or even supporting it. And there are a lot of deluded workers who (for some reason) seem to be feel smart when they parrot the ideas of people who want to screw them.
As well as rational vs irrational they are also just different types of spending.
Hiring someone vs paying a vendor for a service:
- different level of commitment
- might tie your org to a physical location
- different legal risks
- shows investors a different picture (probably this would even influence a bank loan)
- manager has to fight a different bureaucracy
Not to mention that comparing the cost of a hire by looking at their salary is pretty dumb. ISTR hearing at Google that the overall estimated cost of employing a SWE is like 4X their compensation? Can't remember the exact figures though.
Why even pay them at all? Just lock them in a cell and give them a bowl of rice.
Just to put this in context. If every company did this, all over the world, with that same limit, we are talking about something around $45B monthly in revenue for all AI companies to share.
45 billion / 1500 $ is 30 million workers. How did we arrive at 30 million?
I think maybe he meant specifically for software engineers?
Are you saying there are only 30 million people employed in white collar jobs in the world?
About 30 million software developers. At least that's what a quick web search says.
It is not only for devs
https://openai.com/index/codex-for-every-role-tool-workflow/
So, are companies paying that amount for people at other roles to use it?
Obviously not
That's a bold assumption. Increasing costs by roughly $18 000 per employee worldwide is highly unlikely. For reference even at FAANG in Europe, that would be a 7-15% cost increase for a senior developer. More like 15-30% for non FAANG and even more for non-European markets.
I don't think it's a bold assumption, but I also don't think the assumption would lead to the conclusion.
1. Why it's not a bold assumption: it's a bit shocking now. But in two years or so, many/most companies will realize this is the cost of doing business. Just like people are ok with using Outlook, or Office 365, or (in the case of Wall Street) Bloomberg terminals, people will realize that developers will need AI coding assistants.
2. Why the conclusion does not follow from the assumption: if the limit is set at $1500/developer/month, it does not mean all developers will use it. Companies will set incentives for people to not be very wasteful. It is more likely that on average developers will consume $100-200 worth of tokens per month, and there will be some outliers who will consume 10, 100, or 1000 times as much, but they'll be few.
> Office 365
An entreprise license for 0365 is something like $75 per person per month. Totally different order of magnitude.
And regarding Bloomberg terminals, Bloomberg only has 1 million users (semi random guess).
The reality will be that some places just won't pay for any licenses or will try to set up their own, local LLMs.
There are a lot of places in Europe where 1.5k$ is more than 50% of the total cost of an employee.
And the obvious question: what it's the cost of that revenue? Because it looks huge but ...
Don't you forget about India and Latinamerica... No way I see companies paying that much for outsourced employees
One could hire a competent developer here in Brazil for that amount. I know because my workplace has hired competent developers for that amount. You can even call them senior developers, but you can't get "non-startup seniors" with actual experience, those expect a bit more.
I just wanted to take their number at face value. It's not like it needs more real information to make AI a bubble.
World bank says there are 3.7B employed humans. Putting the total addressable market at around 67T if all of us spend USD 1.5k on tokens every month. This lines up well with current forecasts from the major AI labs
> Putting the total addressable market at around 67T if all of us spend USD 1.5k on tokens every month
However, that's an absurd scenario.
well, you couldn't justify the cost if you still employed all 3.7B
Congrats, you're hired at Anthropic.
The $1500 number is less interesting than the fact that they hit a ceiling at all. Most engineering teams I've talked to have no idea what their AI spend is per developer because it's buried in a consolidated cloud bill. Having a hard cap forces two useful conversations: what workflows actually justify API calls vs local inference, and whether the output is being measured against any real productivity metric. Without that feedback loop it's just a race to see who can burn tokens fastest.
Both the Anthropic and OpenAI "Enterprise" plans include per-developer analytics:
Anthropic: https://support.claude.com/en/articles/12883420-view-usage-a...
OpenAI: https://help.openai.com/en/articles/10875114-workspace-analy...
I believe you might be replying to a bot account.
What makes it look like one? All their dead comments read pretty normal to me.
1,5k. For two months of that spend you could buy a machine that can self-host decent models, plus a year's worth of electricity. It's not up there in terms of quality, but with a bit more effort it works pretty decently. I'm completely baffled that that's not way more common, is it really just the quality?
I'd think for most companies the pace of change is too high at the moment. Give it a few years, a bit of a plateau in the improvements in frontier models and I can't see how many of these companies don't implode under the weight of competition on inference prices.
Decent vs best-money-can-buy. Further, a self-hosted LLM will be much slower.
I think we're all past the "bet-money-can-buy" stage. The most expensive models are an order of magnitude more expensive than the middle ground ones, so you need to be selective about what you run where.
And with a bit of careful routing - there isn't a lot stopping you sending the hard stuff to a cloud model and the average stuff to an on prem model.
Only people who do pay-per-use optimize this. Most heavy users have their use covered by an employer.
I have my use covered by my employer but we also have budgets and limits.
Second here. From recent Alibaba Qwen conference: the all-in-one box (DC in a box - I think I was called Apsara, 0.6x0.6x1.5m) plug and play, 1.5TB GPU RAM, capability to run in a fully air gapped environment, any open models... All of that is roughly $300k one time. And this box can do non LLM tasks as well. Performance (throughput) around 20k t/s. Delivery time - around 2 months. For any medium sized company its perhaps cheaper to just buy it once than spending 1.5k for cloud per user
I think the main thing companies should try to understand is avoiding the use of 'claude -p'.
I definitely have written a goal file, and then just ran claude in a loop over the goal in order to 'token max'... why not? I'm doing research and have some clear KPIs where research into all kinds of techniques / tuning can improve the results. I can spend my budget on a "experiment with blah blah blah to improve blah blah" or give it a list of things to try that I know will take awhile.
Its no problem hitting hundreds of $ of API spend while sitting at a computer with 3 monitors have 6 windows of useful claude code interactive sessions, while working on 2 or 3 projects and using worktrees, and it's a little weird when you hit your limit by 2 o'clock and have to wait for token budgets to reset; god forbid, I manually edit code... which I did do for the first time in months.
You can also start to generate a lot of token spend if you do something like "hey make me a stylized slide deck using internal skill / agent XYZ based on commits A through C", which as an engineer, makes presentations building much less painful.
This uber limit is not high compared to the big SV companies.
I also randomly wrote some code in a bind yesterday, while I was on the toilet, and it felt so strange. That was the first I'd written in probably 6 months.
You don't even make small tweaks by hand? There's so many things that are honestly faster to do by hand than wait for agents to do.
Nope I'm a couple levels too far removed from the code at this point for that. Closest I get is during meta-management (modularizing, complexity reduction, etc) with agents
Lock-in / switching costs are increasingly concerning me. I am using Claude for a good year now and have been accumulating so much "knowledge" in there by now. If Claude became less favorable in terms of price/performance in the future, that would worry me. I've started to think about a distributed solution, where my storage is detached from the inference, but currently Claude is still the way to go for me. Wondering if anyone has similar concerns?
Knowledge in there?
Where is the knowledge stored?
All of my knowledge typically gets stored in plans outside of the agent?
And each agent window gets archived regularly, anyways.
My favorite solution to this is to use the Cline coding agent, which is open and allows you to easily switch between different providers and models.
Isn't all the "knowledge" just text files? I've transitioned between services easily by simply copying the text files.
You can even just instruct the LLM to create a context file for you! They are surprisingly good at that as well.
Studies show that LLM-generated context files have a negative impact on LLM performance: https://arxiv.org/abs/2602.11988
What knowledge?
Unless you work in some obscure domain, chances are that any general "knowledge" Claude has "learned" is already public data somewhere.
If you don't believe me, launch Codex and immediately start working on the same project (s). You might discover that all the knowledge accumulated means almost nothing.
Claude Code definitely remembers things about you. For just one of the more obvious examples: I was recently asking it to make some suggestions on software alternatives, and part of the answer included (paraphrased) "While a hosted service may be attractive due to your small ops team size, your experience with hosting Linux container-based services puts this squarely in the realm of an option for you." My prompt mentioned nothing about this.
This isn't something that is public knowledge, in the sense that you mean it.
Just earlier today it asked me if I wanted to create a jira ticket for something I asked it about doing. My prompt mentioned nothing about jira.
If you use Claude Code, you might want to take a look at the "auto memories" files that it creates. See "/memory" for some more information.
This.^ I realized this first when moving a design spec from Claude chat to Claude Code and panicked. I literally had to build something like Notion but for agents to act as a portable memory between all cloud and local models and agents. But honestly it paid off!
If you are interested you can try it out at markbase.cloud (disclaimer and all that). I am not charging for it.
We run a "context" repository that enables us to transition pretty seamlessly from model to model (usually codex to claude and back). It has skills / plugins / connectors / tooling in relatively malleable MD files. That's what I see as the future. Rather than exporting IDE settings we'll just carry our markdown to the next best tool.
It's hedging a bet at this point, but that's why people say there's no moat. If the tools are properly used + maintained, there should be no reason we can't use a new provider even next week (maybe with a little tweaking).
that's an interesting approach and something i also considered (using git to avoid conflicts). one thing i needed was a "database" (basically a folder of markdowns) with a fixed schema so i can let the agents record their decisions in (for example when the code conflicts with product design spec). this combined with search has been a real lifesaver.
this is how it works: https://help.markbase.cloud/humans/collections/overview
Believe it or not, after writing this comment I was doing some more reading on the task. I'm planning to reorganize our context repo after finding this paper (it argues that AI generated context files can stunt the performance of models):
https://arxiv.org/abs/2602.11988
For what it's worth, if you were considering building context out.
Very interesting. Anecdotally I’ve found the opposite to be the case. But I’m very interested in understanding more. Thanks for sharing
Not worried at all. Switching is trivial. Rebuilding context isn't very difficult and harnesses are a dime-a-dozen.
When blue-collars were loosing jobs they were told to learn to code and now engineers are vilifying AI for taking jobs
Do you believe the same people were saying those things? (Were they really?) The idea that "different attitudes towards labor have been expressed by different people" doesn't feel too remarkable
Why isn't self hosting (even just renting a GPU server, not necessarily on premise) at large companies or hosting via something like together AI to run the open weight models not more common? I've tried the open weight models and the premium models like Opus and Gemini Pro, and I find that the latter are a little better, but not nearly to the degree to justify the extreme price difference, since the differences largely don't matter for what I've tried them for, and I expect that many other users likely have similar use cases.
If the premium models are just about 10% better - that could justify the price vs. self hosting a ~0.5-1T open weights model.
Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.
Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.
Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.
For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.
No one got fired from licensing claude code.
There’s probably plenty of money to be made in LLMs as a service - but not enough time has passed for the commodification to occur. I’m with you in that when the dust settles I don’t think any of the frontier model providers will have a moat. Just like during the dotcom boom a catchy URL and a webpage that could accept payments wasn’t a moat, either.
Where are you buying the GPUs to have enough compute to run a medium size buisness?
Why do you think it would be more common? The pooling of GPUs to serve multiple users and connecting to docs/datalakes while respecting security controls, as a start, is non-trivial. You'd end up paying a team to manage that.
I just went through a similar discussion in my $WORK (traditional finance company on NYSE with average IT expertise) and I think the thought process is as such: it's one thing to just give your stellar dev/hacker a beefy GPU server and run whatever model they can run; it's another thing to maintain such platform for company wide. You would need human resource (likely way above normal software dev paygrade) to understand and maintain such models, maintain backend, availability etc. All these extra hassle make it just easier to pay a top tier external lab + slap a reasonable spending limit on everybody.
For the same reasons companies are not building data centers for their "regular" hosting and storage needs but put things on AWS, Azure etc.
It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.
> I've tried the open weight models ...
You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.
I use Claude every day. Often for multiple hours a day. Basically doing my job not worrying how many tokens I spend (as in too many or too few). This is a pretty complex code base (database optimizer and related).
Just looked at spent for the past 30 day, didn't even come to $600. 95% of my tokens are from cache. If I were to reach even $1500 I have to let claude run unsupervised over night (and with the amount of mistakes it still makes and guidance it needs, I do not believe we are there yet.)
is this with a subscription or pure API billing?
> didn't even come to $600.
That's still in the ballpark. A modest change in your usage habits or workload could easily get you there.
> A $1,500 monthly limit per tool strikes me as a rational policy response to over-spending,...
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"
Perhaps there is something to MLM vs LLM to create a FOMO effect.
That's just Simon Willison since LLMs came out. It's glaringly obvious that he's a paid shill.
oh come on, a paid shill?
Simon is very fascinated by AI and at times he can be a little too optimistic but he is generally balanced and his perspective evolves over time which can be seen in his writing.
Nerd who loves nerd things a little too much? Sure. Paid shill by Big LLM? Nah.
Yes, a paid shill. You can find a clear point in time where he shifted from sceptic to 1000% fully onboard non-stop praise, with no reason.
Maybe the reason is because he thought the tools became really powerful?
I'd love to know when that point was myself!
The issue is he’s not actually balanced at all. I’ve never seen him say anything negative about an AI product.
Days ago he said…
“I'm finding that coding agents can take me from a vague idea to a working solution, one with tests and documentation and that looks like a carefully considered project evolved over the course of many weeks... in less than an hour.
Even if the code is rock solid, there's a limit to how many projects like that I can sensibly care for - and if they're instantly abandoned, what value was there from creating them in the first place?”
https://simonwillison.net/2026/May/31/the-solution-might-be-...
Here is Simon questioning a fundamental belief held by the pro-LLM lobby. Would a paid shill question that?
Simon is, without question, an enthusiastic pro-LLM person. I disagree with what he says often, the product market fit post was a bad take. But I don’t believe he is shying away from sharing his thoughts when they’re not favorable to the industry.
That's not at all negative about LLMs, just negative about his own usage of LLMs. He's still very heavily and unrealistically (unless he has very poor coding standards and skills, which I won't rule out) praising LLMs in the sentences you've quoted.
Note that it's not surprising that he finds his own usage (described in the quote) negative, since his real job is as a blogger, not anything else.
Here's my AI misuse tag: https://simonwillison.net/tags/ai-misuse/ - 54 posts
My ongoing coverage of AI ethical issues: https://simonwillison.net/tags/ai-ethics/ - 308 posts
I've been the loudest voice about the fundamental insecurity of LLMs for several years: https://simonwillison.net/tags/prompt-injection/ - 150 posts
In https://simonwillison.net/2025/Aug/25/agentic-browser-securi... I said "I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely."
Literally none of those articles are critizing LLMs, only use made of them by 3rd party actors outside of the providers. It really has nothing to do with LLMs themselves.
The fact that you had to dig to August 2025 to find a single article that's actually a critic of something produced by the AI labs is just further proof.
The prompt injection stuff is very critical of both the technology and the LLM providers especially when I call out that their solution is still to say "they're getting better at avoiding the attacks" when my line has consistently been that "99% is a failing grade".
Genuine question: what would make me a "paid shill"?
Who do you think would be paying me, and what would they expect in return?
Your unwavering praise of LLMs' performance which does not match anyone's reality?
OpenAI or Anthropic would be paying you, like they pay bot farms and other influencers, and they would expect marketing in return, which you provide in boatloads.
Your job is to be an influencer, I'm not sure why anyone would be surprised that this is a possibility.
The asset I most value is my credibility.
The reason so many people read my writing and find it useful is that they see me as a credible source of information: in a world full of clickbait and misinformation, I have a reputation for providing an independent voice that occupies that rare middle ground between "AI will kill us all" doomerism and "AI will solve everything" hype.
Credibility is hard to earn and easy to squander. I've been blogging for 24 years now, which has helped me build credibility with a large array of people across many different interest areas.
The modern influencer business model is to grow an audience and then sell things to them, through partnerships and sponsored content. I refuse to do that, because it strikes directly at that credibility. The moment you say "I've partnered with X to tell you about product Y" you're no longer an independent voice.
Nilay Patel of the Verge (and the excellent Decoder podcast) refuses to read ads from sponsors himself, at significant financial cost to his publication. I've adopted the same policy - I will not let anyone else pay me to put words in my mouth, because it strikes directly at the credibility I value so much.
Until a few months ago the only money I made from my blog was an https://ethicalads.io banner which pulled in a few hundred dollars a month (more if I had a high traffic piece). It helped cover some of my hosting costs for my various projects.
That changed in February - https://simonwillison.net/2026/Feb/19/sponsorship/ - when I added a Troy Hunt-style sponsor banner to my site (no cookies, no JavaScript) - currently sold by an agency called Freeman & Forrest. Sponsored slots are sold on a weekly basis and get a mention in my email newsletter in addition to the blog banner.
I'm earning enough from those that I no longer feel the opportunity cost of not going and getting a proper Silicon Valley engineering job.
If I was a publication like the Verge I'd have a complete firewall between editorial and advertising. I don't have a team, but I've tried to replicate that as much as I can by having Freeman & Forrest sort out the sponsors while I stay hands off. I'll veto sponsors if I have to (no prediction markets etc) but thankfully that hasn't been necessary so far.
I maintain a disclosures section on my blog here: https://simonwillison.net/about/#disclosures - which was inspired by Molly White's: https://www.mollywhite.net/crypto-disclosures/
I'm currently considering extending that to more of an ethics statement like this one on the Verge: https://www.theverge.com/ethics-statement
The Verge policy I'm currently not fulfilling is "Our policy against receiving anything of value from companies we cover includes, but is not limited to, things like gifts, meals, discounted services, or paid trips and junkets. Vox Media and The Verge pay for all travel expenses to all events, including transportation, food, and hotels." - I've occasionally accepted flights, dinners, accommodation and some pretty absurd swag (Microsoft just gave me a jacket with my name stitched onto it as part of the GitHub Stars programme, and a bunch of gadgets in a pelican case) which didn't bother me so much when the blog was a side project, but I think I need to start refusing those kind of gifts.
The day after the jacket I wrote a piece about their new models - https://simonwillison.net/2026/Jun/2/microsofts-new-models/ - which I later had to update because I missed some crucial details. Was I subconsciously influenced by the freebies? I don't think so, but the whole point of "subconsciously" is you don't know for sure.
> That means each employee's AI spending cap is ~11% of that median compensation package.
when looking at costs - numbers make sense. however decisions as an org/company/solo founder - costs help you set prices, but to reach profitability you want to model around ROI.
now the question is what's the ROI for a $36K/investment per engineer or $90M for the total org ?
I bet the ROI is negative.
I'm in a similar boat - it's hard to measure, but let's say you pay an engineer 150K. Giving them a tool that costs 15K a year is effectively a 10% increase in that expense.
If we were seeing 3X, 5X etc improvement from individual engineers, that 10% increase in expense would be a fantastic investment (even 3 engineers for the price of 1.1??!). I have a feeling they are just not seeing that much of an improvement.
Do you think companies are gonna be like?:
Wait a minute. We didn’t save money by adding AI. We just added an expense.
Now we have to pay for employees AND AI.
$300/day at Apple, with an increase to $500 with manager approval.
A blanket cap makes no sense to me. There's a power distribution of AI use in my company and I'd imagine it's the same at a much greater scale at Uber.
I'd guess there should be a few people Uber is bascially allocating unlimited AI spending to and a large swath they're giving basically nothing.
I would assume that at least one of two things are true:
1. They're costs are so so out of control that they need to impose a blanket cap immediately. Figuring out an allocation mechanism that can be deployed company wide is time consuming and they need to staunch the bleeding immediately, despite it being obviously suboptimal.
2. The few people who should have unlimited tokens were given exactly that. No reason to introduce such nuance to a public PR move. The hard-cap limit is a great negotiating posture with token providers.
That's a lot. On my usual day I burn less than $1 on Opus. I could get beyond $10 only if I have a complex and well-defined problem, which is rare (the second part at least).
You must not be using coding agents. You can sneeze and spend $1 on Opus in Claude Code.
If a worker doesn't use their AI/LLM budget, can they get a raise?
probably will get fired for lack of performance.
Let's just say their performance (OKR, KPI, whatever "impact" metric you want) was indistinguishable from a peer that used the AI/LLM monthly allowance in full.
Maybe a $10k raise would be nice?
Theyd get a bad review for leaving performance on the table. When has finishing your work ever resulted in anything other than more work?
It's disturbingly anti-merotocratic. You're not allowed to prove that you're more useful without AI because they just assume that AI is a 10x multiplier on everyone.
no because it does not come from the same budget
Money spent is money spent.
These are still at currently subsidized prices. We'll see if they think they're getting $1500/month of value when that buys significantly fewer tokens.
True but they will raise prices slowly so people will optimize their workflow so they aren't just throwing as much inference as fast as possible like the current state. Right now you should do everything you wanted to try out because it is cheap (as long as you don't become dependent ... the risk).
afaik, enterprise plans are not subsidized. its 20$/seat+api pricing. Unless you are saying api pricing itself is subsidized.
This is market introductory pricing that hasn't factored in cost recovery. Most of it has been run on early investment with the assumption they will recover costs in the long run. The prices are subsidized across the board and they will need to go up signficantly to recover them.
None of what you said is true
And you know this how?
Assuming this were accurate, then presumably the AI companies would be betting that inference costs come down before the bill is due - I don't see enterprises being willing to absorb another ~10x price increase for tokens (as they've just done going from subscription prices to per-token pricing)
For claude shops this was a huge hit. But lets back this up. There are some companies that haven't even built a break-even model at this price because they are funded by investment. As soon as those investors lose patience the first dominos will fall. For those who have somewhat of a business model, will it survive a price increase? The bigger question is do the base model providers have enough runway and have a way to keep going as they need to recover costs.
It's mostly R&D though, not inference. If LLM's effectively become a commodity then they are screwed anyway.
Aren’t the Chinese labs quickly turning them into a commodity?
The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.
Yeah, that's not going to work if you can get e.g. 80% of value by using 10-20x or more cheaper open models. At some point it would just make sense for large companies to rent compute and deploy their version of DeepSeek or whatever (if they don't trust Chinese providers)
There is no evidence that per-token inference prices (which is what Uber is setting a cap on) is subsidized.
Is there any evidence that it's not?
Yes; they ban various uses of their subscriptions but say you can do whatever if you’re paying for the API without limits
That's not evidence. Very likely though, but the only evidence we get one way or another is when they IPO.
This story isn't about those subscriptions - enterprise customers like Uber are paying the full API prices.
That's just market segmentation and them trying to maximize revenue it doesen't really say anything about their costs.
The fact that Anthropic models are offered at the same API pricing by not just themselves but AWS, Azure and Vertex despite Anthropic taking a major slice on licensing along with the cost an open weight 1T parameter model like K2.6 costs to run on any third-party provider, make it unlikely that API inference cost are subsidized by the labs.
Openrouter? i.e. Even excluding Deep Seek inference for very large open models is way cheaper. Maybe these providers are not very profitable but its highly unlikely that they are losing $4 for every $1 they make since selling inference is their only product...
AI companies have more expenses than inference.
yes, and theres no evidence that they arent (or can't) use profitable inference to subsidise those other expenses. Some companies will keep spending massively to train better models, and some other companies will not, and offer good api prices. Which will end up being used? That depends on whether the spending turns into better value models
> theres no evidence that they arent (or can't) use profitable inference to subsidise those other expenses
as far as we know there's no evidence that they can produce any profits at all
The evidence that per-token inference _is_ subsidized is (a) competition is a bloodbath (b) these companies are raising more money than any company has raised ever (c) a maybe-profitable quarter is maybe-coming for Anthropic after maybe-signing a compute deal with SpaceX that legitimizes both companies.
The evidence that per-token inference _is not_ subsidized is... a quote or two from Dario and Sam Altman
I understand current Codex $20 sub is worth about $480 GPT5 api credits.
Way more. Track with https://github.com/junhoyeo/tokscale
It's not. They recently forced enterprise customers onto API billing instead of the cheap consumer pricing. Now the pricing is brutal.
The inference prices for very large open models would indicate that Antrophic's and OpenAI's margins are quite large.
How are people using so many tokens? I'm on the $200/month enterprise plan for Claude Code (because it's a better deal than the API pricing) and I don't come close to the limits.
If you use stuff like opusplan and /advisor so you use Sonnet for most of the work and only Opus for the really complex stuff then it's quite easy to keep costs low without affecting performance.
All new/renewing enterprise contracts with Claude Enterprise and ChatGPT Enterprise no longer offer usage-based subscriptions, but instead will charge API pricing for all tokens consumed, and as you've said, the subs are better deals than raw API pricing.
BigCo's are not using the plans we are using, they can't.
I wonder what they are doing with $1500 per month. I'm on Claude Pro $20 plan and I'm doing well. That's 3 days per week. On the other 2 days I'm using a customer's Claude Max, I don't know if it's the $100 or the $200 plan, but I'm sharing it with some of its other developers.
$1500/mth is token pricing.
Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.
Yea, I’m sure the personal plans are subsidized. I have $200 Claude Max at home and straight API pricing at work and equivalent work would easily cost me 5x if not more on the API.
Next to no one would be using less than the subscription price given how expensive Opus API is.
> Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly.
Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.
Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.
No, we know from the financials of these companies that API prices are close to being at cost and the individual developer plans are heavily subsidized (because they are roughly 10% of API cost per token[1]).
If plans were at cost and API pricing was marked up that would mean there’s a 90%+ profit margin on tokens and instead of raising money and talking about revenue, Anthropic and OpenAI would be talking about their obscene profits.
[1] the caveat is that the average plan user probably doesn’t use all of their quota, I guess maybe 30% is the average across all users.
This completely ignores all the other huge costs the AI labs are paying in data center builds, researcher salaries, experiments, and training models.
The fact that Anthropic is rumoured to have a profitable quarter indicates that their margins on API priced inference are very strong.
Uber is likely on an enterprise plan - these charge tokens at API cost, which can be much more expensive than the $20 flat rate.
I'm on a $100 Claude Max plan, my usage is only about 50% of the plan limits, but in the last 30 days my usage was equivalent to API token spend of $1850. If you save all your Claude Code conversations, the saved files include API costs and you can calculate this yourself.
One of my most expensive sessions cost me over $100 in token spend in a single evening. I'd just found out that the time tracking & invoicing SaaS I use is increasing their monthly pricing by 2.4x - so I assigned Claude Opus 4.8 to recreate the entire SaaS for myself, and load in 13 years of my historical data. I've only completed a full read-only implementation so far, with adding & editing of records still to come, but I do expect Claude will have fully recreated the entire SaaS for me at an API cost less than a single 1 year seat of continued subscription to their service. And since I'm actually on a Max plan, it didn't actually cost me $200 of tokens at all.
coff i would not buy the Bending Spoons IPO coff saaspocalypse
I could ramble on about where the other $1750 of usage goes, but I imagine it's similar for most heavy Claude / AI users. Interactive coding sessions, a daily personalized podcast, some automated overnight agentic "proactive" sessions, a daemon that wakes up if I send Claude an email or voicetext to check something when I'm out. I've also noticed that if Claude's tool-use goes haywire & Claude gets confused or lost, sometimes a single email reply session that would normally be just $1 of API might spiral to $12 of API while it bangs its head against trying to run a program that's in a different folder to the one it's currently in. Sometimes a simple 'pwd' would save you a lot of headache, Claude....
It's also a useful signal for AI value. Looks like it's a max value add of $18,000 per engineer per year.
It's among a wave of fresh "non-insane" takes on AI in the enterprise. Maybe we can reel things in to a sustainable level before a giant bubble bursts.
It's not so simple to determine and generalize how much value AI adds. It's going to be different on a per-company basis and a per-engineer basis. It's also affected by the competitive market place and how many other companies are using AI for their engineers.
For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.
I find it really doubtful anyone has managed to quantify that in any meaningful way. Seems like mostly an arbitrary number. Also the article does claim that's its actual several times more than 18k if you are fine with using Codex, Cursor or etc. when you Claude tokens run out.
Their initial budget for determining how much value AI adds is $18,000 per engineer.
Not really. There are clearly diminishing marginal returns, so it's likely that the first $2,400/engineer/year adds >>$2,400 of value, even if 18,001st $/engineer/year adds <$1 of value.
No, that's not what it means at all even if just doing it purely in math terms. Really it is just a reasonable amount to cap at to stop the long tail of super spenders (tokenmaxxers). You could also call it "the amount of AI spend after which Uber has decided there is diminishing returns for the average engineer".
I'm sure if a dev can show useful results at 1k they won't have trouble getting permission for a higher cap as well.
It means Uber thinks they can sustain that level of expense. Whether engineers at Uber are representative of the rest of the work force is an easily debatable question.
And $1500 a month is on the very high end of where most companies will land. When you run the numbers there isn’t a realistic path that connects the dots between likely market size and the claimed valuation of the AI companies. The math simply does not add up.
Its a lot when using Chinese models, less when using Opus 4.8
This week an S&P 20 company with previously unlimited Claude limits also set a $250/mo/person limit; though its unclear to me how widely the limits are being enforced, may be the case that its just non-software engineers. Do with this info what you will.
In my experience, this is far below the cost the average dev will incur per month so this seems very reasonable to me. And, no doubt there are exceptions for heavy users so they can get some extra token usage when they need it.
unless they changed something in the like 2 months (edit: besides implementing a cap for claude code specifically, since other tools already had caps) since ive left my job there im pretty sure 1500$ is the very max you can use after maxing out free calls, initial budget, then 2 extensions individually reviewed by your manager
higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget
It finally puts a number on productivity gain of engineers with AI. This is probably less than 10% of the cost of an average uber developer. So they don't assume much more productivity gain from AI than 10%.
(Cost of an employee is much higher than their salary, it includes things like office space, supporting structures like HR/accounting, insurance, hardware/software, and much more)
But is it an accurate number? Does AI reach diminishing returns after $1,500/month, or is that all they are willing to risk/burn to stay in this game?
Uber engineers reported that loading their workspace and pulling recent commits exhausted that AI limit for Claude Code (4.8 x-high) immediately.
I don't think loading up a single context window costs $1,500. Which limit are you talking about?
Uber is in the business of experimenting with robotaxis and automated food delivery.
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
Is this inside knowledge, or speculation?
1) This happened because they fundementally misunderstand how to use AI and how AI is priced 2) Most organizations are throwing everything in for analyses and not limiting the answer they want. You need to be specific of about what you analyze and what answers you want 3) People undervalue prompting or templated responses. I will have written. validated and sanity checked a prompt several times and run it across several models before I say its ready for use. But when it is, I know what it will give me and that the scope of its research and answer is as close to what I want as it can be. As little excess as I can. This all saves tokens
It's probabaly a good things that Uber-developers are now forced to do some coding on their own. Only use AI where it absolutely helps
Or be smarter about their usage. $50 on tokens per day can get you a long way.
Some people also take weekends off.
I don't think at $1,500 you're not forced to code on your own at all, in the sense of typing code. You're simply forced to not yolo-max twelve parallel agents at all times.
The big question is, will the productivity gains be absorbed by the needs? Societies don't have a need for infinite amount of luxury and laziness offered by the productivity of the machines. At some point, you would shake off things, get up from the couch and start walking again, breathing afresh.
It still probably produces better results than some junior engineers in a lot of cases.
But yeah, for a company at Uber’s scale, I can see why they would want real engineering discipline around it.
Due to recent Copilot price increase my friend was capped to $70 per month of usage. Not on a subscription…
My $100 subscription is not cheap. At the same time our product burns orders of magnitude more tokens.
The tool categories that pay for themselves fastest: (1) Anything that gets invoices out faster and makes it easier for clients to pay. (2) Scheduling links that eliminate email back-and-forth. Everything else is optimization. I keep notes on which freelancer tools hit each threshold at freelancerkit.surge.sh
I think the logical follow up will be for Uber to lay off a bunch of people so that the remaining ones can token maxx.
To the mooooon!
If you estimate 10k salary per engineer that means the moment it’s cheaper for them to hire another engineer but that doesn’t mean it’s improving productivity 15% but if 15% is the moment it stopped being better than another human we can assume 7.5%?
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
Seems odd limit, especially since it highly dependant on Token provider used, with Opus this is not much and could easily be burnt in a week or less, but with something like deepseek the 1500 can literarily be an annual budget.
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
It’s not just about the model but also setting up the system to create and share compute (GPUs) which is quite complicated on its own. Ubers primary business focus isn’t infrastructure.
eventually tokens will cost price of energy. and china is miles ahead.
china will be major token exporter soon. mark my words.
Technically, tokens travel both ways.
Technically, on both sides there is an intelligence producing them.
Electricity actually is only a small part of the data center costs. There are challenges in getting enough electricity that create problems, but the cost of the electricity really isn’t an issue.
If I were paying API rates this year, I would have already burned through $20k in tokens. Looking forward to the costs of this level of capability coming down.
Reading the headline
Oh that's actually really economical! I wonder if they're doing a lot on locally running models or managing a shared context or knowledge-base in some clever way, maybe just encouraging employees to be efficient and mindful.
...
> each employee
...
> per AI coding tool
...
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI
What on this godforsaken earth are all you rich idiots doing???
A lot of talk about cheaper models here. Just curios, is there any non-Anthropic model that can do UI well? GPT-5.5 is laughably bad, and I'm never restarting my Anthropic subscription after their 6-month sprint of gaslighting, even if opus was really good at UI.
Is anyone doing story point estimation in terms of tokens? If you have a token budget, does this change how you prioritize?
I think there's too much variance between what model you're using and how much you turn your brain off. If I just paste a ticket number into 4.8xHigh its going to use a lot more tokens than if I read the ticket, tell Sonnet what it needs to do, make my commit, run unit tests myself, etc.
I'm curious how much of the usage comes from vibe coding vs using agents/harnesses in internal tooling
If budgeted at $1,500/month per user, power users still can get 5-10x of that allocation if the user pool is large enough.
I think a lot of people are missing that this is $1500 _per tool_ which is still rather a lot of money.
Outside of coding what other tools expend that kind of tokens? People are not creating that many slide decks or videos are they?
If china captures the market now, well deserved. Way cheaper compared to us providers.
Related:
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
https://news.ycombinator.com/item?id=48335388
ccusage for codex tells me the medium feature I prompted in codex, with a $200 subscription, running for 72 hours and still not delivering full result would have cost ~ $2200 at API rates.
I also misconfigured something in my agent's configuration and a simple web tool request (maybe 4 turns) through OR went to GPT-5.5 accidentally and that cost me ~$0.4.
I have no idea how any business can afford API rates without having a mindset of casually setting money on fire.
the real interesting way to address the question of token effectiveness would be internal alpha vs beta testing and measuringing marginal revenue generated by similar teams using ai and at different usage levels. right now $1500 a month is not a meaningful signal of anything beyond current executive willingness to spend. in the long run executives will cut spending where it does not support income generation.
They are also beholden to enterprise pricing and can't use the subsidized consumer max plans.
Token costs rising because data center build costs must be paid down.. is not the whole picture. It is actually possible for token costs to fall despite the spending frenzy.
Naively you’d expect to always keep paying more - but growth in token usage is what changes the equation. Amortizing debt over an exponentially growing amount of spend across a growing customer base (not per customer) lets the debt be paid off & costs covered even as each individual’s spend stays steady or even goes down - but it only works if there’s growth beyond some threshold that makes the whole thing hang together. No one on the outside knows how much growth that is, and everyone chases maximum growth.
Jevons Paradox ends up being your friend as well as the friend of the inference providers as well as the friend of the inference financiers.
If it’s a strong enough effect, it has potential to cancel out all the circular financing too, and let everyone ride out the bursting of the bubble.
China will bring down the price per million tokens.
Why are people getting these high spending numbers? A 200 USD subscription for either Codex or Claude should give you plenty of usage. What am I missing? Are they just being dumb?
The subscriptions are not available to enterprise users. Enterprise users must pay per-token. A $200 subscription gives you roughly the equivalent of $1500 in per-token billing.
What is the point of allowing a developer to spend $18,000 a year on AI subscriptions? Can't they hire a decent developer who is capable of producing a quality solution faster? Clearly, these decisions are all made by high-level management team.
I was recently talking to an HR person from a European company, and she goes: 'We are forcing our developers to use AI coding agents, but they are still kind of hesitant.' This person had never written a single line of code, nor did she know what software engineering is. For these people, using AI coding agents = faster delivery without breaking anything.
It costs a lot more than $18,000 to hire a decent developer, pretty much anywhere in the world. Also using a model is better than another developer in some ways, because there aren't two independent minds trying to work with each other.
I still have never hit a ceiling with my Claude Max $100 account, much less the Max $200 account. I'm not burning tokens needlessly, nor running it all day, but I do use CC almost daily. What are these devs doing that they are burning more than $1500 in tokens a month?
Maybe it's just me, but I still find that I really have to "shepherd" the AI and work with it to get the results I want. And I read every line of code added and challenge the model's logic. So that limits my token burning. Maybe these people are just "vibe-coding" without really checking the results?
I would not be surprised if they have engineers vibecoding 2-3 projects each simultaneously, nonstop, on largely un-moderated review-suggest-iterate-test feedback loops.
All the code gets summarized and fed into their manager's agent contexts, probably duplicated several times across levels and departments, with some generated back-and-forth emails pinging around the org chart, eventually generating 2-3 long-winded reports that nobody will read chock full of generated visualizations that can all get consolidated into a generated slide deck that they'll show (maybe, at some point) to a handful of humans with more money than a human brain can conceptualize to demonstrate all of the innovation they're doing.
I am increasingly convinced that many of these companies are dead trees whose only function is to burn money lest it fall into the hands of the peasantry.
You are paying account pricing. Uber is paying API pricing.
You're $100/m plan is likely equivalent to thousands of dollars of API pricing. You are being subsidized by the companies using AI.
And this is why as the freeloader (includes me) volume goes up, they add more and more rules to constrain us.
I wasn't aware the Max $100/user plan wasn't available to Enterprise; it used to be IIRC
just don't care about the output. Produce more. Don't check the results.
I have strong conviction that companies will now choose tech stack/programming languages based on 'tokenomics'. I am vibe coding using Clojure, a language I can read but cannot write and I never hit the usage limits even when using the latest model on Claude. I have similar experience with F#, which is a bit more verbose than clojure but absolutely beats every OOP language, Python, Typescript etc.
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
Typescript is also hugely represented. My projects are TS in a big way, where I have no experience with it at all.
They want to replace employees with AI, then replace paid AI with unpaid AI.
Their wet dream was never automation. It was zero marginal cost labor. And that dream is starting to rot.
Why aren't they using Claude code 20x for 200/month?
if you have more than x seats, you have to use Enterprise pricing as far as I know which is pay as you go with a pool.
It's wild; at my shop in Silicon Valley they dropped us from unlimited use to 60% prem budget on copilot. People are walking around like zombies.
Poor people! Thinking takes calories
no....the fact that you could buy a reasonably prices MAC or AMD395+ thats AI tool pricing; it loads a big enough model and spits out tokens just fast enough that you can read what it's doing and comprehend it instead of magic.
That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.
A lot of things can be done with local models.
Even more things can be done without any models just as well.
Single developers seeking local models.
goated comment