Is this not just about extra credit? So what's included in the subscription doesn't change - just extra credits are now token based instead of message based? (For Plus/Pro)
Fair point. We only have clear evidence they're being more transparent about credit pricing and value, but it's unclear whether that'll make people burn through usage faster or slower.
The fuzziness is intentional. It gives them wiggle room and obscures how much "value" you actually get from $200, a 5-hour block, or a week. That keeps the tension manageable between subscription pricing and pay-per-token API pricing, especially for larger businesses on enterprise plans who want transparent $-per-MTok rates.
If they were fully transparent, like "your $200 sub gets you up to $2,000 of equivalent API usage," it would be a constant fight. People would track pennies and scream any time 5-hour blocks got throttled during peak hours. Businesses would push harder for pay-per-token discounts seeing that juicy $200 sub value.
I think this might also impact how usage is calculated for subscription plans as well, not just overages (using tokens instead of messages for calculating usage). But the message from OpenAI seems vague.
Why not just attach a real dollar amount, rather than using "credits"?
Well, I know why. I just wanted to be snarky. It's just that trying to hide the actual price is getting a bit old. Just tell me that generating this much code will cost me $10.
I hate that pattern so much. It’s also not just to obfuscate the spending - it’s also to ensure you already have some amount left over in your account, so that it feels like you’re not spending as much to just “top up” and afford that one thing you want this time.
If you have some left over that you can’t spend, it feels like you’ve “wasted” them.
The answer is so that they can charge different prices per credit. If you buy low amounts, they can charge one price. If you buy in bulk, they can offer a discount. The usage is the same, but they can differentiate price per usage to give people more a favorable price if they are better customers.
If the expensive parts of the query happen to work iteratively (especially if agentic), you can act on those loops to bound the cost. Even if it's pure forward generation, you could pause an expensive inference and continue it seamlessly with a cheaper model, adding little to the cost.
> The price of credits in some currency could change after you bought them.
> The price of credits could be different for different customers (commercial, educational, partners, etc)
Maybe I'm missing something, but doesn't every other compute provider manage that without introducing their own token currency? Convert to the user's currency at the end of the month, when the invoice comes in. On the pricing page, have a table that lists different prices for different customers. I fail to see how tokens make it clearer. Compare:
"This action costs 1 token, and 1 token = $0.03 for educational in the US, or 0.05€ for commercial in the EU"
"This action costs $0.03 for educational in the US, or 0.05€ for commercial in the EU"
> They can ban trading of credits or let them expire
otherwise you end up with "get a $20 subscription for 1000% more value -- equivalent to $200 in API usage!!![1]; [1] -- compared to API pricing for american companies on the first weekend of the month between 18:00 and 22:00 UTC+8 during full moon"
The title is misleading and not in the article. This change is for business/enterprise accounts. Also, these are still credit based. The change is that credits now operate on tokens like the API rather than on messages as they used to.
> Customers on existing Plus, Pro and Enterprise/Edu plans should continue to use the legacy rate card. We’ll migrate you to the new rates in the upcoming weeks.
Nope, they buried the lead a bit but this is coming for _all_ users, even pro/plus subscription plans. So you get chatgpt pro/plus benefits, and then effectively $20/$200 in credits for codex
First of all, there's no dollar amount tied to how many credits you get for a subscription.
Second, if you look at the prices for bundles of _extra_ credits and then do some math on the Codex rate card, you'll see that there's no way they would work out to be the same or similar.
> First of all, there's no dollar amount tied to how many credits you get for a subscription.
I don't understand what you mean here; their official comms is:
Customers on existing Plus, Pro and Enterprise/Edu plans should continue to use the legacy rate card. We’ll migrate you to the new rates in the upcoming weeks.
To me, anyway, that means that GP was exactly right - they'll give the $20 subscriptions $20 worth of credits, and the $200 dollars subscriptions $200 worth of credits. That is what the "New Rates" are!
I think it would be more rational to discount a subscription (standard is about 10% in most industries) vs PAYG and agree in principal with your assertion - they haven't specified what the discount is on credits bought in a subscription plan - but there is no indication that they are going to continue allowing thousands of dollars of credits on a $200/m plan.
My guess would be a 10% (or similar) discount if you buy a subscription.
So Anthropic bundled CC with Claude.ai cuz OAI bundled chatgpt with Codex, now OAI is unbundling, IPO must be around the corner. Writing is also on the wall for CC usage based subscriptions now that main competitor effectively got rid of it. How are the Chinese models looking?
Based on reading only I think a usable but a step below probably somewhat behind Sonnet. I also did read that some people successfully requested refunds late last year when their models shit the bed due to bugs so if they cut limits hard maybe you can try that
Is this something that is likely to also change the way Github Copilot bills? Right now the billing is message-based, not token-based. And OpenAI and Microsoft are rather opaquely intertwined in the AI space.
Hard to say, but GitHub Copilot also allows access to Anthropic, Google and Grok models, so I don't know that a change from a single provider would necessarily change how they bill
We are exiting a hype cycle, well into the adoption curve. Subscriptions were never going to last.
My next step is going to be evaluating open and local models to see if they are sufficiently close to par with frontier models.
My hope is that the end of seat based pricing comes with this tech cycle. I was looking for document signing provider that doesn't charge a monthly, I only need a few docs a year.
I recently experimented creating a Python library from scratch with Codex. After I was done, I took the PRD and Task list that was generated and fed them to opencode with Qwen 3.5 running locally.
Opencode was able to create the library as well. It just took about 2x longer.
I'm developing software in this area right now, so I try a lot of the new models. They're not even close for coding tasks. It basically comes down to 26b parameters vs 1T parameters / quantisation / smaller context sizs, there's no comparison. However, for agentic work, tool calling, text summarisation, local LLMs can be quite capable. Workloads that run as background tasks where you're not concerned about TTFB, cold starts, tok/s etc., this is where local AI is useful.
If you have an M processor then I would recommend that you ditch Ollama because it performs slowly. We get double or triple tok/s using omlx or vmlx, respectively, but vmlx doesn't have extensive support for some models like gpt-oss.
Kimi K2.5 (as an example) is an open model with 1T params. I don't see a reason it has to be local for most use cases- the fact that it's open is what's important.
That is just idealism. Being "open" doesnt get you any advantage in the real world. You're not going to meaningfully compete in the new economy using "lesser" models. The economy does not care about principles or ethics. No one is going to build a long term business that provides actual value on open models. They can try. They can hype. And they can swindle and grift and scalp some profit before they become irrelevant. But it will not last.
Why? Because what was built with an open model can be sneezed into existence by a frontier model ran via first party API with the best practice configurations the providers publish in usage guides that no one seems to know exist.
The difference between the best frontier model (gpt-5.4-xhigh or opus 4.6) and the best open model is vast.
But that is only obvious when your use case is actually pushing the frontier.
If you're building a crud app, or the modern equivalent of a TODO app, even a lemon can produce that nowadays so you will assume open has caught up to closed because your use case never required frontier intelligence.
A model with open weights gives you a huge advantage in the real world.
You can run it on your own hardware, with perfectly predictable costs and predictable quality, without having to worry about how many tokens you use, or whether your subscription limits will be reached in the most inconvenient moment, forcing you to wait until they will be reset, or whether the token price will be increased, or your subscription limits will be decreased, or whether your AI provider will switch the model with a worse one, and so on.
Moreover, no matter how good a "frontier model" may be, it can still produce worse results than a worse model when the programmer who manages it does not also have "frontier intelligence". When liberated of the constraints of a paid API, you may be able to use an AI coding assistant in much more efficient ways, exactly like when the time-sharing access to powerful mainframes has been replaced with the unconstrained use of personal computers.
When I was very young I have passed through the transition from using remotely a mainframe to using my own computer. I certainly do not want to return to that straitjacket style of work.
That's only good for the web based UI. If you want Gemini API access which is what this article is about then you must go the AIStudio route and pricing is API usage based. It does have a free usage tier and new signups can get $300 in free credits for the paid tier so it's I think it's still a good deal, just not as good as using the subscriptions would be.
No? Isn't the article about Codex, which is roughly equivalent to "Gemini CLI" and Google's Antigravity? Google's subscriptions include quotas for both of those, albeit the $20 monthly "Pro" plan has had its "Pro" model quota slashed in the last few weeks. You still get a large number of "Gemini 3 Flash" queries, which has been good enough for the projects I've toyed with in Antigravity.
I guess that's true but I find Google's models better than their public tooling. The Pro subscription includes "Gemini Code Assist and Gemini CLI" but the Gemini Code Assist plugin for IntelliJ which is my daily driver is broken most of the time to the degree that it's completely unusable. Sometimes you can't even type in the input box.
The only way I can do serious development with Gemini models is with other tooling (Cline, etc) that requires API based access which isn't available as part of the subscription.
I agree. Gemini models are held back by their segmentation of usage between multiple products, combined with their awful harnesses and tooling. Gemini cli, antigravity, Gemini code assist, Jules.... The list goes on. Each of these products has only a small limit and they must share usage.
It gets worse than that though. Most harnesses that are made to handle codex and Claude cannot handle Gemini 3.1 correctly. Google has trained Gemini 3.1 to return different json keys than most harnesses expect resulting in awful results and failure. (Based on me perusing multiple harness GitHub issues after Gemini 3.1 came out)
If you aggressively use all buckets Google is incredibly generous. In theory for one AI pro subscription you can get what is a ridiculous return in investment in a family plan.
You could probably be charging google literally thousands if all 6 members were spamming video and image generation and antigravity.
I use the free Chat AIs all the time; Claude, ChatGPT, Gemini, Grok, Mistral.
In the last month they have all clamped down quite heavily. I use to be able to deep-dive into a subject, or fix a small Python project, multiple times per day on the free Web UIs.
Claude, this morning, modified a small Python project for me and that single act exhausted all my free usage for the day. In the past I could do multiple projects per day without issue.
Same with ChatGPT. Gemini at least doesn't go full on "You can use this again at 1100AM", but it does fallback to a model that works very poorly.
Grok and Mistral I don't really use that much, but Grok's coding isn't that bad. The problem is that it is not such a good application for deep-diving a topic, because it will perform a web search before answering anything, making it take long.
Mistral tends to run out of steam very quickly in a conversation. Never tried code on it though.
I bought one of the google AI packages that came with a pile of drive storage and Gemini access.
Unfortunately gemini as a coding agent is a steaming useless pile. They have no right selling it, cheap open weight Chinese models are better at this point.
It's not stupid it just is incompetent at tool use and makes bad mistakes. It constantly gets itself into weird dysfunctional loops when doing basic things like editing files.
I'm not sure what GOOG employees are using internally, but I hope they're not being saddled with Gemini 3.1. It's miles behind.
Are you using gemini CLI or antigravity? The former is not really comparable to the latter in terms of quality. I wouldn't say antigravity is as good as the competition but it's pretty close. Miles behind is overstating it.
Gemini CLI but also used the Gemini models via opencode. They're terrible at CLI tool use. Like I said, just editing text files, they fall over rapidly, constantly making mistakes and then mistakes fixing their mistakes.
Antigravity wants me to switch IDEs, and I'm not going to do that.
Gemini 3.1 is a good coding agent. We've been totally spoiled now. Also, if you use Antigravity you can burn up Opus 4.6 credits off your Goog account instead, before you have to switch to Gem 3.1.
Check out z.ai coder plan. The $27/mo plan is roughly the same usage as the 20x $200 Claude plan. I have both and Claude is a little better, but GLM 5.1 is much better value.
Agreed, I use Z.ai and the usage is fantastic the only temper that recommendation that it's often unreliable. Perhaps a few times per week it's unresponsive. Maybe more often it seems to become flakey.
It's very variable though recently I'm noticing it's more reliable but there was a patch where it was nearly unusable some days.
Agreed. They had a rough patch around the 4.7 to 5 upgrade. New architecture required hardware migration. The 5 to 5.1 upgrade was much smoother (same architecture new weights). As you say, little rough around edges, but still great value. Trick I learned is that it's max 2 parallel requests per user. You can put a billion tokens a month through it, but need to manage your parallelism.
If you're ok with a model provider that goes down all the time and has such a poor inference engine setup that once you get past 50k tokens you're going to get stuck in endless reasoning loops.
I feel they will go token base at some point, currently if you only use it with precise prompts and not random suggestions, switch between models 5.4 and 5.4 mini depending on the work, it is the best deal.
What has actually changed? It's unclear how much can you do right now, unless they've already switched you to the new plan and you're speaking from experience.
Ultimately, we need to know the true cost of this technology to evaluate how effectively or ineffectively it can displace the workforce that existed before it.
MiniMax M2.7, MiMo-V2-Pro, GLM-5, GLM5-turbo, Kimi K2.5, DeepSeek V3.2, Step 3.5 Flash (this last one is particularly cheap while still being powerful).
Absolutely not. I took on some thins that would normally take 5-10 people and many months.
Some people are turn out slop. I was really excited to try and make some impressive shit. My whole life has been dedicated to trying to embody what Apple preached in the early days.
I knew this was coming, but I thought I had a little more time to try and get them over the finish line, ya know?
Maintenance by hand might be achievable, but it’s extremely hard when you’ve built something really big.
I’ve only got so much savings left to live on.
I’m not saying anyone owes me anything, but we all need to pivot and in a lot less sure my pivot is going to work out now
> I took on some thins that would normally take 5-10 people and many months.
Based on what, exactly?
It's very easy to claim some software would've taken you months to make, but this is ridiculous. Estimating project duration is well known to be impossible in this field. A few years ago you'd get laughed out the room for making such predictions.
> I’ve only got so much savings left to live on.
Respectfully, what are you doing here?
Yeah sure, the Apple dream. But supposing AI did in fact make you this legendary 100x developer, so it would to everyone else including those with significantly more resources. You'd still be run out of the market by those with bigger budgets or more marketing, and end up penniless all the same.
I would strongly recommend you not put all your proverbial eggs in this basket.
I’ve pivoted to writing native iOS, macOS, windows, Linux apps. Most of my career has been front end web. It would take me awhile just to learn and practice, vs having my visions working in hours or days
I’m not ready to unveil the thing I alluded to, it’s important to me that it’s good and polished. But I’ve done quite well so far developing in Swift, Rust, Go, and coming up with marketing and design — things I definitely couldn’t do by hand without a lot more time and effort.
https://poolometer.com/
Is one of the things I’m almost ready to call ready. So much domain expertise or tedious math involved — I simply wouldn’t have bothered on my own, pre-AI
I agree it’s a huge existential risk that everyone is also amazing. So far that’s not true. I get hung up on a lot of little quirks, like getting Dolby Vision to play properly on Apple Silicon without Vulcan. Something I accomplished after about 2 weeks of relentless determination.
To be clear I’m just trying to answer your questions honestly. I understand the situation. It’s almost to my benefit the harder it is for non Software Engineers. But in our current reality, when I’m not launched yet, it’s more stress
> So much domain expertise or tedious math involved — I simply wouldn’t have bothered on my own, pre-AI
This is what I was alluding to. AI did not let you write software you couldn't otherwise make, or let you write it faster. You skipped doing the research because AI gave you plausible results, but without doing the research yourself you cannot be sure of it's accuracy.
That isn't faster software development, it's reckless software development, and nothing really stopped you from doing it before other than your own recognition that pulling numbers out of your ass is a bad idea.
> I agree it’s a huge existential risk that everyone is also amazing. So far that’s not true. I get hung up on a lot of little quirks, like getting Dolby Vision to play properly on Apple Silicon without Vulcan. Something I accomplished after about 2 weeks of relentless determination.
That would be "doing the research", and as you have observed, is the slow part then and now.
It's really not. As a one-person IT department I'm now able to build things in hours or days that it previously would have taken my weeks or even months to build (and thus they didn't get done). Things people have wanted for years that I didn't ever have the time for, I can now say "yes" to.
Yeah the ops alone is a huge win. It’s such a win I didn’t even think to mention it ha.
Dangerous too of course. So many times I’ve had subtle unexpected side effects. But it’s all about pinning thins down well and that’s what we’re all still figuring out well
> Is writing it by hand the old-fashioned way not on the table?
Of course it is. I started a (commercial) product in Jan, on track for in-field testing at the end of April.
Of course, it's not my f/time job, so I've only been working on it a/hours, but, with the exception of two functions, everything else is hand-coded.
I rubber-ducked with AI, but they never wrote the product for me (other than those two functions which I felt too lazy to copy from an existing project and fixup to work in the new project).
If my math is right, assuming a mix of around 70% cached tokens, 20% input tokens, and 10% output tokens, it breaks even with the old pricing at around 130k tokens per message, or about 13k output tokens per message.
With the hidden reasoning tokens and tool calls, I have no idea how many tokens I typically use per message. I would guess maybe a quarter of that, which would make the new pricing cheaper.
The only catch is that you’ve spent many $1 and you don’t get any of those $10s unless you get over the finish line
In that sense your analogy is kinda good. I totally agree the current situation is like getting my solo start up funded and subsidized … but with only like 4 months runway now that the prices are skyrocketing, vs ~2+ for a typical YC venture
Yeah, but... it's rocketing for everyone at the same time on all the providers at once.
IOW, you are no further behind nor further ahead than your competitors compared to 1 week ago, 1 month ago, 1 year ago and 1 decade ago.
Everyone has the same tools you have. The only advantage you get is if you make your own tools (I did that, and pre-AI, was able to modify my LoB WebApps at a rate of 1x new API endpoint, tested and pushed to production, every 15m).
My comment was about the rapid and sudden cost spike of something happening unexpectedly.
They announced 2x tokens with months of notice. This announced this with no notice.
Me as an individual making a go solo is not the same as thousands of funded businesses having free credits, subsidized plans and bottomless AI budgets.
For a short period this was a massive equalizer. Now it’s a tool for those who can afford it. That’s a big shift.
—
Why is it that a person cannot express their own circumstances or opinions on this site without it turning into an argument? It’s so deflating.
But it was well understood that the subscription was heavily subsidized. Whether or not it was a "separate product" doesn't matter as much as the fact that pricing was not sustainable.
It was not well understood that it would stop being subsidized without notice.
Does that just not matter in modern society? I’m an asshole for expecting the product I pay for on day 1 to be the same on day 8 and 29 of a 30-days subscription?
Although I have to say I am sometimes surprised how much people burn through their usage. I was briefly on a Claude Max plan and then switched to a pro plan and still almost never hit my limit.
Literally every VC funded consumer product has switched from a "growth at all costs" phase to a "Now we hike prices, make money, and generally enshittify" phase, and tons of those companies are still around (e.g. Uber), so I'm not sure why anyone thinks it would be much different for AI.
I think the situation we'll end up in is having closed models that are fast and near perfect but expensive, and a lot of cheap open-source models that are good enough for most people.
> No moat --> It's basically OpenAI, Google, and Anthropic left at the SOTA. Maybe soon, we'll have 2 left.
Yeah, but do we even need them? Non-SOTA is still pretty damn good; remember last year, pre-SOTA? How many people were boasting 10x - 100x productivity increases using the end-2025 models?
So the non-sota models support doing 10 hours of work in 1 hour. Many people would be fine with that. Fine enough that they aren't going to spring for a SOTA model that cuts the 10 hours to 0.5 hours, they're just going to use the cheap models to cut the 10 hours down to 1 hour.
Ok, ok, so they can't keep up with "Demand". Now lets go parse what that demand is:
Is it: We want to use this to "Kill a bunch of people"
Is it: I'm very lonely, and need something to tell me suicide is ok
Is it: Google is so filled with ads, I'm just going to ask the LLM what to buy
Is it: A useful coding tool to improve work flow towards end products.
Cause, if we ignore ethics, some of that demand will generate revenue to pay for it's existence; the others will do nothing of the sort.
Just because there's demand doesn't mean that demand equates to the value of the product. There's lots of demand for LED light bulbs, but once those light bulbs are sold, that demand disappears into the night. This isn't an analogy of AI, but to demonstrate you can't just wave your hands and say "demand leads to a sustainable business model".
As I see it, the only thing close to a moat is CC for Anthropic, and since it is a big ol' fucking mess that is a) apparently now beyond the ability of any current SOTA LLM to fix, and b) understood by absolutely no human, I'd say it's not much of a moat. The other agents will catch up sooner rather than later.
The other providers? I don't see a moat. We jump ship at the drop of a hat.
Every time an Ed Zitron article is posted on HN, it is met with a torrent of vitriol and personal attacks. The articles are okay if not overly wordy but I don’t see how the subject matter elicits that strong of a response.
At any rate, this observation is not unique to Ed, lots of people have made the same conclusion that the math doesn’t add up from a business profitability perspective.
> The articles are okay if not overly wordy but I don’t see how the subject matter elicits that strong of a response.
Hot take, but really it's more of an observation than a take: We saw this exact response in Blockchain & crypto circles a few years ago. (Though HN wasn't quite as culturally "central" to those)
Economic Bubbles are subject to the Tinkerbell Effect. They exist so long as people exist in them, and collapse when either 1) They become so financially unsustainable as to collapse, having consumed all the money the economy could possibly give them, or 2) People stop believing in the bubble and stop feeding it money.
In this regard, the statement "NTFs are stupid" was not merely ridiculing those who bought them, but a direct attack on the bubble and those invested in it. And this is something the people involved in the bubble understand instinctively, even if they aren't consciously aware of it. (There's a psychological mechanism to that, but it's not relevant)
So consequently, they react aggressively to dissent. They seek to enforce their narrative, because not doing so is a threat to the bubble and their financial interests.
---
AI's not much different to that. It's clearly a bubble to everyone including the AI execs saying it out loud.
And people react aggressively to dissent like Ed's, because if the wider public stops believing in AI's future, the bubble bursts. They'll stop tolerating datacenter construction, they'll sell their Nvidia shares, they'll demand regulators restrict AI.
(And to those who can feel their aggression rising reading this comment. Hi, yes. I see you. If I were wrong, nothing I said would matter. You'd be wasting your time engaging with it, history would simply prove me wrong. But by all means, type up that reply or click that button.)
I agree with Ed Zitron more often than not, but I do think he Flanderized himself into being the aggressively-anti-AI-guy to the point where he now makes claims about the capabilities of "AI" that are incorrect regularly. I see people on HN doing the same thing, making claims about capabilities that were true as recently as 6 months ago, but aren't true anymore.
[I'm an AI-doomer myself, but I am an AI-doomer because by and large this stuff increasingly works, not because it doesn't.]
That said, Ed Zitron still does a lot of useful research into the economics of the industry and I also believe that continued progress in AI can disrupt the world (for better and for worse) while the economics propping up all the frontier model providers can also implode spectacularly.
Some people talk about how AI doom comes about either way because it could take all of our jobs OR crash the economy when the current bubble bursts. But as an uber-AI-doomer I happen to think there is a very real possibility of a double downside (for the labor class, at least) where both of those things can happen at the same time!
Yeah I've noticed this phenomenon as well. Frankly, I expected to be downvoted into oblivion just for mentioning him. But Zitron's commentary on the financial implications of AI usually reads as common sense to me and checks out (granted I'm not really capable of refuting his points.)
Billions of USD in debt, a business model bleeding cash with no profit in perspective, high-competition environnement, a sub-par product, free-to-use offline models taking off, potential regulatory issues, some investor commitments pulling out... tricky.
But let's not cry for the founders, they managed to get away with tons of money. The problem is for the fools holding the bag.
How is it a subpar product? I've been very happy with GPT 5.4 and the Codex CLI tooling, as well as ChatGPT web. I'd say product is one of their strengths.
I don't use anywhere near $1000/mo of inference. But yes, the question of what to do when prices go up a lot does concern me. However, with respect to product alone, Codex is still very good.
Yeah you guys have to pay attention to the state of the overall economy. We are in the credit-crunch phase of a recession. The funny money has ran out and infinite loans are no longer available. These companies have to find way to pay their debt now
In my Codex dashboard, I can buy 1000 extra credits for $40. The credit cost for GPT-5.4 is 375 credits / 1M output tokens which translates to $15 / 1M output tokens which exactly equals the API rate.
5.4 is great. I use it for python professionally and for typescript/front-end games and educational apps recreationally. In my experience it's roughly as good as opus, just a lot cheaper. It's amazing how much usage you get for $20/mo
I'm really curious about how you use it, because for me it was braindead. I tried tasking it to update my personal workout app and it created so many bugs I had to clean up with Opus or be left with spaghetti. It also keeps asking for confirmation of doing basic things.
> I tried tasking it to update my personal workout app and it created so many bugs I had to clean up with Opus or be left with spaghetti.
I find it sad that some people are already at the point where "My only options are to leave it as spaghetti or pay for another LLM to fix it". Already their skills are atrophied.
I also don’t think vast majority of SWEs ever had the skill to read and truly comprehend other people’s code and then work dilligently to “fix” it. People will such skills, in my experience, are often highly compensated contractors. every codebase which has survived the test of time has numerous “absolutely do not touch this code, everything will break and no one knows why” part(s) of the codebase…
There basically the same. Codex is better at some things, Claude is better at other things. It’s honestly a wash, just pick the one that gives you a warmer fuzzy feeling in your tummy.
this is indicative to me that the exponential is slowing down. tool and model progress was huge in 2025 but has been pretty stale this year. the usage changes from anthropic, gemini, and openai indicate it's just a scale of economy issue now so unless there's a major breakthrough they're just going to settle down as vendors of their own particular similar flavor of apis.
I think it signals that they’ve been so successful that they need to ensure there is some direct financial back pressure on heavy users to ensure that their heavy token use is actually economically productive. That’s not a bad thing. Giving away stuff for free - or even apparently for free - encourages a poor distribution of value.
> I think it signals that they’ve been so successful that they need to ensure there is some direct financial back pressure on heavy users to ensure that their heavy token use is actually economically productive.
Jesus, the spin on this message is making me dizzy.
They finally try to stop running at a loss, and you see that as "they've been so successful"?
Here's how I see it: they all ran out of money trying to build a moat, and now realise that they are commodity sellers. What sort of profit do you think they need to make per token at current usage (which is served at below cost)?
How are they going to get there when less-highly-capitalised providers are already getting popular?
What makes you think that progress has stopped? Anecdotally I personally seem to think that it's accelerated, I am having conversations with ambitious non tech people and they now seem to be excited and are staying up late learning about cli and github. They seem to have moved beyond lovable and are actually trying to embed some agents in their small businesses, etc.
> They seem to have moved beyond lovable and are actually trying to embed some agents in their small businesses, etc.
That's the problem - these small businesses are writign code, models from last year are good enough for them, and as a small business they can easily shell out for hardware to self-host.
The minute businesses take-up AI for their business processed, the will to buy each employee a subscription is going to go the way of the dodo.
Honestly? It was the claude code leak that did it. There was a lot more smoke and mirrors than I anticipated, the poisoning tool calls, how their prompting is, how "messy" a lot of it was etc.
I meant that I thought the exponential with the models is slowing down (AGI, etc). The application though for regular people will continue to go forward.
I don't think there has been any exponential in terms on inference costs in the last couple of years. In fact, they have worsened as the same relevant hardware is more expensive and so is energy - and to top it off, to stay SotA companies are using larger models with higher running costs. But for some reason people are conflating the improvements in models with the cost of inference.
This pricing only really makes sense if the users can predict their usage, if not people that use this heavily are just going to be hamstrung and are going to start rationing their usage.
What if the goal was to draw us away from building our own AI data centers with their cheaper prices then eventually make us pay up for the difference?
from what they wrote, they're just changing how they measure the usage; might even be a good thing if you manage your context right:
> This format replaces average per-message estimates for your plan with a direct mapping between token usage and credits. It is most useful when you want a clearer view of how input, cached input, and output affect credit consumption.
Qwen has also been improving recently, in fact most have, so depending on when you last tried them you can try again and see how they work for you
My local Qwen is decent for some things, Kimi is decent for most things and occasionally it has been able to do better than Opus and GPT 5.4 on particular tasks
it will soon be very costly to stay in just one provider
The current pricing model (for plus) feels deliberately confusing to me, I can never really tell if I'm nearing any kind of limit with my account since nothing really seems to tell me.
Does this mean there’s no such thing as a “subscription” to ChatGPT for businesses? I thought they offered businesses a subscription with some amount of built in quota previously, including for the side products like codex and sora.
There are still subscriptions that give access to both ChatGPT and Codex, but with a much smaller usage quota than before the change (which came at the same time as the end of the 2x promo). I couldn't find the equivalent in terms of credit for the usage included with these $20/25 seats...
If you use Google's tooling but not if you need API access. API access is not in the subscriptions and uses token based pricing. For development I find that the Gemini IDE plugins that have good free usage and are included in the subscriptions aren't great. Gemini plug-in under IntelliJ is often broken, etc. The best experience is with other tools like Cline where you've had to use a developer based account which is API usage based already.
But Gemini's API based usage also has a free tier and if that doesn't work for you (they train on your data) and you've never signed up before you get several hundred dollars in free credits that expire after 90 days. 3 months of free access is a pretty good deal.
For home projects, I almost exclusively use the web chat interface to code. I haven't done anything large yet so I will iterate and get the web chat to update code, print out the code that I copy and paste.
How does this differ in terms of pricing than Codex?
Token-based usage accounting is more accurate and therefore more sustainable than message-count-based usage accounting. It should've been this way to begin with.
> I would prefer if it actually explodes sooner rather than later
The idea, as far as I can tell from all the pro-AI developers, was that it will never explode, and the performance will continue increasing so the slop they write today doesn't need maintenance, because when that time comes around there will be smarter models that can clean it up.
If the providers are tightening the screws now (and they are all doing it at the same time), it tells me that either:
1. They are out of runway and need to run inference at a profit.
or
2. They think that this is as good as it is going to get, so the best time to tighten the screws is right now.
They could also do a plan 3 where they discourage others so they can use it to, say, rapidly build many new products but competitors would have to pay a fortune for the same luxury
> They could also do a plan 3 where they discourage others so they can use it to, say, rapidly build many new products but competitors would have to pay a fortune for the same luxury
Unlikely that they all decided to do this within weeks of each other. Still, like you said, you were spit-balling, not asserting :-)
Is this not just about extra credit? So what's included in the subscription doesn't change - just extra credits are now token based instead of message based? (For Plus/Pro)
Yes.
> This format replaces average per-message estimates with a direct mapping between token usage and credits.
It's to replace the opaque, per-message calculation, not the subscription plan.
It does feel like also impact the usage meter for subscription plans?
Usage meter has always been completely opaque anyway. They could (and probably did) shrink the limit whenever they like.
Ostensibly this makes usage meter rate changes more transparent?
It is a bit insidious that the price hike coincide with the end of 2x promotion, which makes the usage change a bit more obscure.
It's not a price hike, it's actually making it easier to understand relative usage for different models/features.
I have no idea what I’m getting for $200/mo at this point. Maybe that’s on me, idk.
I have no idea what I'm getting for $20/mo, either. (But I do know that it's at least $180 less than what I could be spending, I suppose.)
Fair point. We only have clear evidence they're being more transparent about credit pricing and value, but it's unclear whether that'll make people burn through usage faster or slower.
The fuzziness is intentional. It gives them wiggle room and obscures how much "value" you actually get from $200, a 5-hour block, or a week. That keeps the tension manageable between subscription pricing and pay-per-token API pricing, especially for larger businesses on enterprise plans who want transparent $-per-MTok rates.
If they were fully transparent, like "your $200 sub gets you up to $2,000 of equivalent API usage," it would be a constant fight. People would track pennies and scream any time 5-hour blocks got throttled during peak hours. Businesses would push harder for pay-per-token discounts seeing that juicy $200 sub value.
God every single title I read about AI on this site ends up being a straight up lie.
I miss “BREAKING NEWS” as it is used at X /s
I think this might also impact how usage is calculated for subscription plans as well, not just overages (using tokens instead of messages for calculating usage). But the message from OpenAI seems vague.
Why not just attach a real dollar amount, rather than using "credits"?
Well, I know why. I just wanted to be snarky. It's just that trying to hide the actual price is getting a bit old. Just tell me that generating this much code will cost me $10.
Pay 100 Gold or 15 Gems to generate this feature
You joke but as a parent, I’m so sick of the gem packs, etc. they try to push on the kids to obfuscate your actual spend on games in real world money.
And now it feels like the are gamifying the compute we use for work for all the same reasons.
I hate that pattern so much. It’s also not just to obfuscate the spending - it’s also to ensure you already have some amount left over in your account, so that it feels like you’re not spending as much to just “top up” and afford that one thing you want this time.
If you have some left over that you can’t spend, it feels like you’ve “wasted” them.
I refuse to play games where you pay real money for consumables.
Board games do not have this problem.
What is snarky about that?
The answer is so that they can charge different prices per credit. If you buy low amounts, they can charge one price. If you buy in bulk, they can offer a discount. The usage is the same, but they can differentiate price per usage to give people more a favorable price if they are better customers.
Is there anything wrong with that?
A fundamental architectural problem is that they genuinely do not know what a query will cost ahead of time.
Even for a single standalone LLM that's the case, and the 'agentic' layers thrown on top just make that problem exponentially worse.
One'd need to entirely switch away from LLMs to fix this problem.
Isn't this an orthogonal issue that doesn't affect whether billing is done with credits or money?
If the expensive parts of the query happen to work iteratively (especially if agentic), you can act on those loops to bound the cost. Even if it's pure forward generation, you could pause an expensive inference and continue it seamlessly with a cheaper model, adding little to the cost.
I can think of a few other reasons:
- Not everyone uses dollars.
- The price of credits in some currency could change after you bought them.
- The price of credits could be different for different customers (commercial, educational, partners, etc)
- They can ban trading of credits or let them expire
> Not everyone uses dollars.
> The price of credits in some currency could change after you bought them.
> The price of credits could be different for different customers (commercial, educational, partners, etc)
Maybe I'm missing something, but doesn't every other compute provider manage that without introducing their own token currency? Convert to the user's currency at the end of the month, when the invoice comes in. On the pricing page, have a table that lists different prices for different customers. I fail to see how tokens make it clearer. Compare:
"This action costs 1 token, and 1 token = $0.03 for educational in the US, or 0.05€ for commercial in the EU"
"This action costs $0.03 for educational in the US, or 0.05€ for commercial in the EU"
> They can ban trading of credits or let them expire
That sounds extremely user-hostile to me
otherwise you end up with "get a $20 subscription for 1000% more value -- equivalent to $200 in API usage!!![1]; [1] -- compared to API pricing for american companies on the first weekend of the month between 18:00 and 22:00 UTC+8 during full moon"
in any case, better than what anthropic does
> user-hostile
credits do expire (I thought they always do?), apparently it's not really up to them: https://news.ycombinator.com/item?id=46230848
Taximeter effect
The title is misleading and not in the article. This change is for business/enterprise accounts. Also, these are still credit based. The change is that credits now operate on tokens like the API rather than on messages as they used to.
> Customers on existing Plus, Pro and Enterprise/Edu plans should continue to use the legacy rate card. We’ll migrate you to the new rates in the upcoming weeks.
Nope, they buried the lead a bit but this is coming for _all_ users, even pro/plus subscription plans. So you get chatgpt pro/plus benefits, and then effectively $20/$200 in credits for codex
> effectively $20/$200 in credits for codex
That's not true.
First of all, there's no dollar amount tied to how many credits you get for a subscription.
Second, if you look at the prices for bundles of _extra_ credits and then do some math on the Codex rate card, you'll see that there's no way they would work out to be the same or similar.
> First of all, there's no dollar amount tied to how many credits you get for a subscription.
I don't understand what you mean here; their official comms is:
To me, anyway, that means that GP was exactly right - they'll give the $20 subscriptions $20 worth of credits, and the $200 dollars subscriptions $200 worth of credits. That is what the "New Rates" are!
I think it would be more rational to discount a subscription (standard is about 10% in most industries) vs PAYG and agree in principal with your assertion - they haven't specified what the discount is on credits bought in a subscription plan - but there is no indication that they are going to continue allowing thousands of dollars of credits on a $200/m plan.
My guess would be a 10% (or similar) discount if you buy a subscription.
1. Look at the new rate card for how many credits are used for each category (that's what the discussion is about).
2. Look at some of your typical sessions for token counts and calculate how many credits that would have been.
3. Look at the rates for extra credits (that's the only place credits have a price).
4. See that you are getting more than $200/mo worth of credits where we have evidence of the value of a credit.
If that doesn't clear it up, then I can't help, sorry.
> effectively $20/$200 in credits for codex
So, 1.3ish million tokens for Codex? Following the token limit from here https://openai.com/api/pricing/
So Anthropic bundled CC with Claude.ai cuz OAI bundled chatgpt with Codex, now OAI is unbundling, IPO must be around the corner. Writing is also on the wall for CC usage based subscriptions now that main competitor effectively got rid of it. How are the Chinese models looking?
> Writing is also on the wall for CC usage based subscriptions now that main competitor effectively got rid of it.
And I just subscribed for a year's worth of Claude... Terrible timing I guess. Do you know if the open models are viable?
Based on reading only I think a usable but a step below probably somewhat behind Sonnet. I also did read that some people successfully requested refunds late last year when their models shit the bed due to bugs so if they cut limits hard maybe you can try that
Is this something that is likely to also change the way Github Copilot bills? Right now the billing is message-based, not token-based. And OpenAI and Microsoft are rather opaquely intertwined in the AI space.
Hard to say, but GitHub Copilot also allows access to Anthropic, Google and Grok models, so I don't know that a change from a single provider would necessarily change how they bill
For the past month, I've been claiming that $20/mo codex is the best deal in AI.
Now I'm going to have to find the new best deal.
We are exiting a hype cycle, well into the adoption curve. Subscriptions were never going to last.
My next step is going to be evaluating open and local models to see if they are sufficiently close to par with frontier models.
My hope is that the end of seat based pricing comes with this tech cycle. I was looking for document signing provider that doesn't charge a monthly, I only need a few docs a year.
I recently experimented creating a Python library from scratch with Codex. After I was done, I took the PRD and Task list that was generated and fed them to opencode with Qwen 3.5 running locally.
Opencode was able to create the library as well. It just took about 2x longer.
Which version of Qwen 3.5 did you use?
which quant as well
Not at my computer now, either 27 or 35b not quantized.
Next week I will be trying qwopus 27b.
I'm developing software in this area right now, so I try a lot of the new models. They're not even close for coding tasks. It basically comes down to 26b parameters vs 1T parameters / quantisation / smaller context sizs, there's no comparison. However, for agentic work, tool calling, text summarisation, local LLMs can be quite capable. Workloads that run as background tasks where you're not concerned about TTFB, cold starts, tok/s etc., this is where local AI is useful.
If you have an M processor then I would recommend that you ditch Ollama because it performs slowly. We get double or triple tok/s using omlx or vmlx, respectively, but vmlx doesn't have extensive support for some models like gpt-oss.
Kimi K2.5 (as an example) is an open model with 1T params. I don't see a reason it has to be local for most use cases- the fact that it's open is what's important.
That is just idealism. Being "open" doesnt get you any advantage in the real world. You're not going to meaningfully compete in the new economy using "lesser" models. The economy does not care about principles or ethics. No one is going to build a long term business that provides actual value on open models. They can try. They can hype. And they can swindle and grift and scalp some profit before they become irrelevant. But it will not last.
Why? Because what was built with an open model can be sneezed into existence by a frontier model ran via first party API with the best practice configurations the providers publish in usage guides that no one seems to know exist.
The difference between the best frontier model (gpt-5.4-xhigh or opus 4.6) and the best open model is vast.
But that is only obvious when your use case is actually pushing the frontier.
If you're building a crud app, or the modern equivalent of a TODO app, even a lemon can produce that nowadays so you will assume open has caught up to closed because your use case never required frontier intelligence.
A model with open weights gives you a huge advantage in the real world.
You can run it on your own hardware, with perfectly predictable costs and predictable quality, without having to worry about how many tokens you use, or whether your subscription limits will be reached in the most inconvenient moment, forcing you to wait until they will be reset, or whether the token price will be increased, or your subscription limits will be decreased, or whether your AI provider will switch the model with a worse one, and so on.
Moreover, no matter how good a "frontier model" may be, it can still produce worse results than a worse model when the programmer who manages it does not also have "frontier intelligence". When liberated of the constraints of a paid API, you may be able to use an AI coding assistant in much more efficient ways, exactly like when the time-sharing access to powerful mainframes has been replaced with the unconstrained use of personal computers.
When I was very young I have passed through the transition from using remotely a mainframe to using my own computer. I certainly do not want to return to that straitjacket style of work.
first session with gemma4:31b looks pretty good, like it may actually be up to coding tasks like gemini-3-flash levels
you can tell gemma4 comes from gemini-3
Already paying for Google photo storage, AI pro for an extra $7 is a steal with anti-gravity.
That's only good for the web based UI. If you want Gemini API access which is what this article is about then you must go the AIStudio route and pricing is API usage based. It does have a free usage tier and new signups can get $300 in free credits for the paid tier so it's I think it's still a good deal, just not as good as using the subscriptions would be.
No? Isn't the article about Codex, which is roughly equivalent to "Gemini CLI" and Google's Antigravity? Google's subscriptions include quotas for both of those, albeit the $20 monthly "Pro" plan has had its "Pro" model quota slashed in the last few weeks. You still get a large number of "Gemini 3 Flash" queries, which has been good enough for the projects I've toyed with in Antigravity.
I guess that's true but I find Google's models better than their public tooling. The Pro subscription includes "Gemini Code Assist and Gemini CLI" but the Gemini Code Assist plugin for IntelliJ which is my daily driver is broken most of the time to the degree that it's completely unusable. Sometimes you can't even type in the input box.
The only way I can do serious development with Gemini models is with other tooling (Cline, etc) that requires API based access which isn't available as part of the subscription.
I agree. Gemini models are held back by their segmentation of usage between multiple products, combined with their awful harnesses and tooling. Gemini cli, antigravity, Gemini code assist, Jules.... The list goes on. Each of these products has only a small limit and they must share usage.
It gets worse than that though. Most harnesses that are made to handle codex and Claude cannot handle Gemini 3.1 correctly. Google has trained Gemini 3.1 to return different json keys than most harnesses expect resulting in awful results and failure. (Based on me perusing multiple harness GitHub issues after Gemini 3.1 came out)
Google is by far the best deal for AI, they give you so many 'buckets' of usage for a variety of products, and they seem to keep adding them.
If you aggressively use all buckets Google is incredibly generous. In theory for one AI pro subscription you can get what is a ridiculous return in investment in a family plan.
You could probably be charging google literally thousands if all 6 members were spamming video and image generation and antigravity.
The family sharing is the real hack lol. I don't think any other provider does that.
Good luck sticking within limits, I have been burning up my baseline limits insanely fast within a few prompts, a marked change from a few weeks ago.
There's a few complaints online about the same happening to multiple users.
Otherwise anti-gravity has been great.
I use the free Chat AIs all the time; Claude, ChatGPT, Gemini, Grok, Mistral.
In the last month they have all clamped down quite heavily. I use to be able to deep-dive into a subject, or fix a small Python project, multiple times per day on the free Web UIs.
Claude, this morning, modified a small Python project for me and that single act exhausted all my free usage for the day. In the past I could do multiple projects per day without issue.
Same with ChatGPT. Gemini at least doesn't go full on "You can use this again at 1100AM", but it does fallback to a model that works very poorly.
Grok and Mistral I don't really use that much, but Grok's coding isn't that bad. The problem is that it is not such a good application for deep-diving a topic, because it will perform a web search before answering anything, making it take long.
Mistral tends to run out of steam very quickly in a conversation. Never tried code on it though.
I bought one of the google AI packages that came with a pile of drive storage and Gemini access.
Unfortunately gemini as a coding agent is a steaming useless pile. They have no right selling it, cheap open weight Chinese models are better at this point.
It's not stupid it just is incompetent at tool use and makes bad mistakes. It constantly gets itself into weird dysfunctional loops when doing basic things like editing files.
I'm not sure what GOOG employees are using internally, but I hope they're not being saddled with Gemini 3.1. It's miles behind.
Are you using gemini CLI or antigravity? The former is not really comparable to the latter in terms of quality. I wouldn't say antigravity is as good as the competition but it's pretty close. Miles behind is overstating it.
Gemini CLI but also used the Gemini models via opencode. They're terrible at CLI tool use. Like I said, just editing text files, they fall over rapidly, constantly making mistakes and then mistakes fixing their mistakes.
Antigravity wants me to switch IDEs, and I'm not going to do that.
Gemini 3.1 is a good coding agent. We've been totally spoiled now. Also, if you use Antigravity you can burn up Opus 4.6 credits off your Goog account instead, before you have to switch to Gem 3.1.
Check out z.ai coder plan. The $27/mo plan is roughly the same usage as the 20x $200 Claude plan. I have both and Claude is a little better, but GLM 5.1 is much better value.
Agreed, I use Z.ai and the usage is fantastic the only temper that recommendation that it's often unreliable. Perhaps a few times per week it's unresponsive. Maybe more often it seems to become flakey.
It's very variable though recently I'm noticing it's more reliable but there was a patch where it was nearly unusable some days.
I guess I won't complain for the price and YMMV.
Agreed. They had a rough patch around the 4.7 to 5 upgrade. New architecture required hardware migration. The 5 to 5.1 upgrade was much smoother (same architecture new weights). As you say, little rough around edges, but still great value. Trick I learned is that it's max 2 parallel requests per user. You can put a billion tokens a month through it, but need to manage your parallelism.
If you're ok with a model provider that goes down all the time and has such a poor inference engine setup that once you get past 50k tokens you're going to get stuck in endless reasoning loops.
GH Copilot is still the best deal, while it lasts
Yeah, it's really good. Probably going to be the next best deal until they cut back.
I need to try the command line version.
> I need to try the command line version.
Is there any other?
I feel they will go token base at some point, currently if you only use it with precise prompts and not random suggestions, switch between models 5.4 and 5.4 mini depending on the work, it is the best deal.
What has actually changed? It's unclear how much can you do right now, unless they've already switched you to the new plan and you're speaking from experience.
The days of subsidized access is rapidly coming to an end.
Good!
It’s kind of a rug pull to effectively raise the price like 10x. I can’t afford to finish some of my projects with this change
That is okay.
Ultimately, we need to know the true cost of this technology to evaluate how effectively or ineffectively it can displace the workforce that existed before it.
Agreed, this has to happen and the sooner the better.
There are plenty of good models on Openrouter that are very cheap, maybe it's time to experiment with alternatives.
what are some of them?
Kimi K2
MiniMax M2.7, MiMo-V2-Pro, GLM-5, GLM5-turbo, Kimi K2.5, DeepSeek V3.2, Step 3.5 Flash (this last one is particularly cheap while still being powerful).
Can't judge on the quality of the comparison but I'd start from https://arena.ai/leaderboard/code and maybe from OpenRouter's ranking.
Is writing it by hand the old-fashioned way not on the table?
Not really. Many scenarios where that would mean spending 50x the time or hiring a team.
Absolutely not. I took on some thins that would normally take 5-10 people and many months.
Some people are turn out slop. I was really excited to try and make some impressive shit. My whole life has been dedicated to trying to embody what Apple preached in the early days.
I knew this was coming, but I thought I had a little more time to try and get them over the finish line, ya know?
Maintenance by hand might be achievable, but it’s extremely hard when you’ve built something really big.
I’ve only got so much savings left to live on.
I’m not saying anyone owes me anything, but we all need to pivot and in a lot less sure my pivot is going to work out now
> I took on some thins that would normally take 5-10 people and many months.
Based on what, exactly?
It's very easy to claim some software would've taken you months to make, but this is ridiculous. Estimating project duration is well known to be impossible in this field. A few years ago you'd get laughed out the room for making such predictions.
> I’ve only got so much savings left to live on.
Respectfully, what are you doing here?
Yeah sure, the Apple dream. But supposing AI did in fact make you this legendary 100x developer, so it would to everyone else including those with significantly more resources. You'd still be run out of the market by those with bigger budgets or more marketing, and end up penniless all the same.
I would strongly recommend you not put all your proverbial eggs in this basket.
I’ve pivoted to writing native iOS, macOS, windows, Linux apps. Most of my career has been front end web. It would take me awhile just to learn and practice, vs having my visions working in hours or days
I’m not ready to unveil the thing I alluded to, it’s important to me that it’s good and polished. But I’ve done quite well so far developing in Swift, Rust, Go, and coming up with marketing and design — things I definitely couldn’t do by hand without a lot more time and effort.
https://poolometer.com/ Is one of the things I’m almost ready to call ready. So much domain expertise or tedious math involved — I simply wouldn’t have bothered on my own, pre-AI
I agree it’s a huge existential risk that everyone is also amazing. So far that’s not true. I get hung up on a lot of little quirks, like getting Dolby Vision to play properly on Apple Silicon without Vulcan. Something I accomplished after about 2 weeks of relentless determination.
To be clear I’m just trying to answer your questions honestly. I understand the situation. It’s almost to my benefit the harder it is for non Software Engineers. But in our current reality, when I’m not launched yet, it’s more stress
> So much domain expertise or tedious math involved — I simply wouldn’t have bothered on my own, pre-AI
This is what I was alluding to. AI did not let you write software you couldn't otherwise make, or let you write it faster. You skipped doing the research because AI gave you plausible results, but without doing the research yourself you cannot be sure of it's accuracy.
That isn't faster software development, it's reckless software development, and nothing really stopped you from doing it before other than your own recognition that pulling numbers out of your ass is a bad idea.
> I agree it’s a huge existential risk that everyone is also amazing. So far that’s not true. I get hung up on a lot of little quirks, like getting Dolby Vision to play properly on Apple Silicon without Vulcan. Something I accomplished after about 2 weeks of relentless determination.
That would be "doing the research", and as you have observed, is the slow part then and now.
I guess it’s my bad for trying to do more than I could otherwise do alone.
Poolometer looks cool! I will say your smiley face icons look a lil odd and ai generated, but otherwise I love the graph tracking and suggestions
> I’ve only got so much savings left to live on.
This confuses me - did you leave your job to cosplay as an EM, using LLMs to build your products? If not, then your savings don't matter.
It's really not. As a one-person IT department I'm now able to build things in hours or days that it previously would have taken my weeks or even months to build (and thus they didn't get done). Things people have wanted for years that I didn't ever have the time for, I can now say "yes" to.
Then I would say they judged the situation correctly when they decided to raise prices.
That said: competition will soon kick in.
Yeah totally. I’m just surprised they did this AND ended the 2x promo simultaneously.
I had my hopes up to switch to local but my first few passes didn’t pan out with that so far. But I’m optimistic it’ll land soon.
I think I need to lower my ambitions too. I got my hopes up since AI can do everything but how long it takes to do it right can really drag on
Yeah the ops alone is a huge win. It’s such a win I didn’t even think to mention it ha.
Dangerous too of course. So many times I’ve had subtle unexpected side effects. But it’s all about pinning thins down well and that’s what we’re all still figuring out well
Sounds like, in the words of Douglas Adams, a SEP.
This isn't your problem; this is management's problem for cutting headcount, or not caring about the things that people wanted.
As it isn't your problem, paint it bright pink and move on.
What am I an assembler programmer now?!? Am I to plug wires and flip switches!?!
/s
> Is writing it by hand the old-fashioned way not on the table?
Of course it is. I started a (commercial) product in Jan, on track for in-field testing at the end of April.
Of course, it's not my f/time job, so I've only been working on it a/hours, but, with the exception of two functions, everything else is hand-coded.
I rubber-ducked with AI, but they never wrote the product for me (other than those two functions which I felt too lazy to copy from an existing project and fixup to work in the new project).
If my math is right, assuming a mix of around 70% cached tokens, 20% input tokens, and 10% output tokens, it breaks even with the old pricing at around 130k tokens per message, or about 13k output tokens per message.
With the hidden reasoning tokens and tool calls, I have no idea how many tokens I typically use per message. I would guess maybe a quarter of that, which would make the new pricing cheaper.
Sounds like saying my plan to get rich buying up $10 bills for $1 hit kind of a rug pull in that people aren't selling them for that price anymore.
The only catch is that you’ve spent many $1 and you don’t get any of those $10s unless you get over the finish line
In that sense your analogy is kinda good. I totally agree the current situation is like getting my solo start up funded and subsidized … but with only like 4 months runway now that the prices are skyrocketing, vs ~2+ for a typical YC venture
Yeah, but... it's rocketing for everyone at the same time on all the providers at once.
IOW, you are no further behind nor further ahead than your competitors compared to 1 week ago, 1 month ago, 1 year ago and 1 decade ago.
Everyone has the same tools you have. The only advantage you get is if you make your own tools (I did that, and pre-AI, was able to modify my LoB WebApps at a rate of 1x new API endpoint, tested and pushed to production, every 15m).
My comment was about the rapid and sudden cost spike of something happening unexpectedly.
They announced 2x tokens with months of notice. This announced this with no notice.
Me as an individual making a go solo is not the same as thousands of funded businesses having free credits, subsidized plans and bottomless AI budgets.
For a short period this was a massive equalizer. Now it’s a tool for those who can afford it. That’s a big shift.
—
Why is it that a person cannot express their own circumstances or opinions on this site without it turning into an argument? It’s so deflating.
I don't think you can call it a rug pull when everybody saw it coming from miles away
I avoided Claude code and such the first few months because I thought it was all billed by the API. Which I knew was not worth it to me at all.
Then I realized I was an idiot and this was magic.
But it now seems more like an introductory offer to use the API, as opposed to an alternative product / way to use their API product.
I thought it would get increasingly expensive, like say the $200 plan becomes $400.
Switching these plans to API metering doesn’t feel like it’s a separate product anymore?
But it was well understood that the subscription was heavily subsidized. Whether or not it was a "separate product" doesn't matter as much as the fact that pricing was not sustainable.
It was not well understood that it would stop being subsidized without notice.
Does that just not matter in modern society? I’m an asshole for expecting the product I pay for on day 1 to be the same on day 8 and 29 of a 30-days subscription?
So many folks are just burning tokens just to burn them.
The infrastructure build out just can't keep up with it.
Management demands it
Almost as though selling below cost or over capacity will backfire if people find unexpected uses for your product.
subsidies always lead to waste.
This is false.
Two examples:
- https://www.msn.com/en-us/money/other/three-years-after-tria...
- https://record.umich.edu/articles/public-school-investment-r...
Although I have to say I am sometimes surprised how much people burn through their usage. I was briefly on a Claude Max plan and then switched to a pro plan and still almost never hit my limit.
They changed the limits out from under us, and bugs cause usage to spike like crazy.
I just hit my weekly Max limit 3 days in...
It’s Joever.
Sounds like a death knell to me.
If I recall correctly, Ed Zitron noted in a recent article that one of the horsemen of his AI-pocalypse would be price hikes from providers.
Literally every VC funded consumer product has switched from a "growth at all costs" phase to a "Now we hike prices, make money, and generally enshittify" phase, and tons of those companies are still around (e.g. Uber), so I'm not sure why anyone thinks it would be much different for AI.
yes, but how many succeed without any kind of moat or having destroyed the existing companies?
I'm still running local LLMs and finding perfectly acceptable code gen.
I think the situation we'll end up in is having closed models that are fast and near perfect but expensive, and a lot of cheap open-source models that are good enough for most people.
No moat --> It's basically OpenAI, Google, and Anthropic left at the SOTA. Maybe soon, we'll have 2 left.
> No moat --> It's basically OpenAI, Google, and Anthropic left at the SOTA. Maybe soon, we'll have 2 left.
Yeah, but do we even need them? Non-SOTA is still pretty damn good; remember last year, pre-SOTA? How many people were boasting 10x - 100x productivity increases using the end-2025 models?
So the non-sota models support doing 10 hours of work in 1 hour. Many people would be fine with that. Fine enough that they aren't going to spring for a SOTA model that cuts the 10 hours to 0.5 hours, they're just going to use the cheap models to cut the 10 hours down to 1 hour.
Despite this, OpenAI and Anthropic and Google can't keep up with demand. That should tell you about what people want.
Ok, ok, so they can't keep up with "Demand". Now lets go parse what that demand is:
Is it: We want to use this to "Kill a bunch of people"
Is it: I'm very lonely, and need something to tell me suicide is ok
Is it: Google is so filled with ads, I'm just going to ask the LLM what to buy
Is it: A useful coding tool to improve work flow towards end products.
Cause, if we ignore ethics, some of that demand will generate revenue to pay for it's existence; the others will do nothing of the sort.
Just because there's demand doesn't mean that demand equates to the value of the product. There's lots of demand for LED light bulbs, but once those light bulbs are sold, that demand disappears into the night. This isn't an analogy of AI, but to demonstrate you can't just wave your hands and say "demand leads to a sustainable business model".
Which ones, if you don't mind sharing?
Those companies at least had somewhat of a moat.
As I see it, the only thing close to a moat is CC for Anthropic, and since it is a big ol' fucking mess that is a) apparently now beyond the ability of any current SOTA LLM to fix, and b) understood by absolutely no human, I'd say it's not much of a moat. The other agents will catch up sooner rather than later.
The other providers? I don't see a moat. We jump ship at the drop of a hat.
That guy has his own form of AI psychosis
I'd say he's allowed his "mostly correct" opinion on the financial situation to color his "mostly incorrect" opinion on actual AI usefulness.
I wouldn't call it psychosis though. He's committing a natural fallacy where expertise in one area doesn't lend itself to expertise in another.
Every time an Ed Zitron article is posted on HN, it is met with a torrent of vitriol and personal attacks. The articles are okay if not overly wordy but I don’t see how the subject matter elicits that strong of a response.
At any rate, this observation is not unique to Ed, lots of people have made the same conclusion that the math doesn’t add up from a business profitability perspective.
> The articles are okay if not overly wordy but I don’t see how the subject matter elicits that strong of a response.
Hot take, but really it's more of an observation than a take: We saw this exact response in Blockchain & crypto circles a few years ago. (Though HN wasn't quite as culturally "central" to those)
Economic Bubbles are subject to the Tinkerbell Effect. They exist so long as people exist in them, and collapse when either 1) They become so financially unsustainable as to collapse, having consumed all the money the economy could possibly give them, or 2) People stop believing in the bubble and stop feeding it money.
In this regard, the statement "NTFs are stupid" was not merely ridiculing those who bought them, but a direct attack on the bubble and those invested in it. And this is something the people involved in the bubble understand instinctively, even if they aren't consciously aware of it. (There's a psychological mechanism to that, but it's not relevant)
So consequently, they react aggressively to dissent. They seek to enforce their narrative, because not doing so is a threat to the bubble and their financial interests.
---
AI's not much different to that. It's clearly a bubble to everyone including the AI execs saying it out loud.
And people react aggressively to dissent like Ed's, because if the wider public stops believing in AI's future, the bubble bursts. They'll stop tolerating datacenter construction, they'll sell their Nvidia shares, they'll demand regulators restrict AI.
(And to those who can feel their aggression rising reading this comment. Hi, yes. I see you. If I were wrong, nothing I said would matter. You'd be wasting your time engaging with it, history would simply prove me wrong. But by all means, type up that reply or click that button.)
I agree with Ed Zitron more often than not, but I do think he Flanderized himself into being the aggressively-anti-AI-guy to the point where he now makes claims about the capabilities of "AI" that are incorrect regularly. I see people on HN doing the same thing, making claims about capabilities that were true as recently as 6 months ago, but aren't true anymore.
[I'm an AI-doomer myself, but I am an AI-doomer because by and large this stuff increasingly works, not because it doesn't.]
That said, Ed Zitron still does a lot of useful research into the economics of the industry and I also believe that continued progress in AI can disrupt the world (for better and for worse) while the economics propping up all the frontier model providers can also implode spectacularly.
Some people talk about how AI doom comes about either way because it could take all of our jobs OR crash the economy when the current bubble bursts. But as an uber-AI-doomer I happen to think there is a very real possibility of a double downside (for the labor class, at least) where both of those things can happen at the same time!
A lot of HN posters are fighting for their employer and/or investments.
> The articles are okay if not overly wordy
Did you mean instead "The articles are okay if overly wordy"?
Probably!
Yeah I've noticed this phenomenon as well. Frankly, I expected to be downvoted into oblivion just for mentioning him. But Zitron's commentary on the financial implications of AI usually reads as common sense to me and checks out (granted I'm not really capable of refuting his points.)
> Every time an Ed Zitron article is posted on HN, it is met with a torrent of vitriol and personal attacks.
It's why I started to pay attention to what he says.
Dude is a bit verbose, but his rationale is solid. If it gets the panties of some people here in a bunch, he may be on to something.
Things must be bad if they're doing this before their IPO
Billions of USD in debt, a business model bleeding cash with no profit in perspective, high-competition environnement, a sub-par product, free-to-use offline models taking off, potential regulatory issues, some investor commitments pulling out... tricky.
But let's not cry for the founders, they managed to get away with tons of money. The problem is for the fools holding the bag.
Unfortunately the fools holding the bag are going to be those who own index funds when these companies are inserted into them.
How is it a subpar product? I've been very happy with GPT 5.4 and the Codex CLI tooling, as well as ChatGPT web. I'd say product is one of their strengths.
Will you be as happy when your $1000/mo of inference you’ve been getting for $30/mo is gonna cost $1000/mo?
I don't use anywhere near $1000/mo of inference. But yes, the question of what to do when prices go up a lot does concern me. However, with respect to product alone, Codex is still very good.
It's heavily subsidized.
I pay for it, but I don't think it's worth much more than the 20 bucks a month I have been paying.
Once they start charging something that makes sense, I doubt it will be as good.
Yeah you guys have to pay attention to the state of the overall economy. We are in the credit-crunch phase of a recession. The funny money has ran out and infinite loans are no longer available. These companies have to find way to pay their debt now
In my Codex dashboard, I can buy 1000 extra credits for $40. The credit cost for GPT-5.4 is 375 credits / 1M output tokens which translates to $15 / 1M output tokens which exactly equals the API rate.
Any takes on how Codex compares to Claude? I mostly use it to run ahead, document, investigate and prep the actual implementation for Claude.
Gemini burned me too many times but maybe the situation has improved since.
5.4 is great. I use it for python professionally and for typescript/front-end games and educational apps recreationally. In my experience it's roughly as good as opus, just a lot cheaper. It's amazing how much usage you get for $20/mo
gpt-5.4 is unmatched. Claude is possibly better in web UI tasks, but not much else.
I'm really curious about how you use it, because for me it was braindead. I tried tasking it to update my personal workout app and it created so many bugs I had to clean up with Opus or be left with spaghetti. It also keeps asking for confirmation of doing basic things.
> I tried tasking it to update my personal workout app and it created so many bugs I had to clean up with Opus or be left with spaghetti.
I find it sad that some people are already at the point where "My only options are to leave it as spaghetti or pay for another LLM to fix it". Already their skills are atrophied.
Or I just don't wanna spend any decision capital on that? There's many apps I would never have been able to do time wise before.
I also don’t think vast majority of SWEs ever had the skill to read and truly comprehend other people’s code and then work dilligently to “fix” it. People will such skills, in my experience, are often highly compensated contractors. every codebase which has survived the test of time has numerous “absolutely do not touch this code, everything will break and no one knows why” part(s) of the codebase…
There basically the same. Codex is better at some things, Claude is better at other things. It’s honestly a wash, just pick the one that gives you a warmer fuzzy feeling in your tummy.
Codex has better quality and way more usage, but Claude Code is more pleasant to interact with and use in a lot of tiny ways
this is indicative to me that the exponential is slowing down. tool and model progress was huge in 2025 but has been pretty stale this year. the usage changes from anthropic, gemini, and openai indicate it's just a scale of economy issue now so unless there's a major breakthrough they're just going to settle down as vendors of their own particular similar flavor of apis.
I think it signals that they’ve been so successful that they need to ensure there is some direct financial back pressure on heavy users to ensure that their heavy token use is actually economically productive. That’s not a bad thing. Giving away stuff for free - or even apparently for free - encourages a poor distribution of value.
> I think it signals that they’ve been so successful that they need to ensure there is some direct financial back pressure on heavy users to ensure that their heavy token use is actually economically productive.
Jesus, the spin on this message is making me dizzy.
They finally try to stop running at a loss, and you see that as "they've been so successful"?
Here's how I see it: they all ran out of money trying to build a moat, and now realise that they are commodity sellers. What sort of profit do you think they need to make per token at current usage (which is served at below cost)?
How are they going to get there when less-highly-capitalised providers are already getting popular?
What makes you think that progress has stopped? Anecdotally I personally seem to think that it's accelerated, I am having conversations with ambitious non tech people and they now seem to be excited and are staying up late learning about cli and github. They seem to have moved beyond lovable and are actually trying to embed some agents in their small businesses, etc.
> They seem to have moved beyond lovable and are actually trying to embed some agents in their small businesses, etc.
That's the problem - these small businesses are writign code, models from last year are good enough for them, and as a small business they can easily shell out for hardware to self-host.
The minute businesses take-up AI for their business processed, the will to buy each employee a subscription is going to go the way of the dodo.
Honestly? It was the claude code leak that did it. There was a lot more smoke and mirrors than I anticipated, the poisoning tool calls, how their prompting is, how "messy" a lot of it was etc.
I meant that I thought the exponential with the models is slowing down (AGI, etc). The application though for regular people will continue to go forward.
> this is indicative to me that the exponential is slowing down
I've also heard that, we're near the end of the exponential.
I don't think there has been any exponential in terms on inference costs in the last couple of years. In fact, they have worsened as the same relevant hardware is more expensive and so is energy - and to top it off, to stay SotA companies are using larger models with higher running costs. But for some reason people are conflating the improvements in models with the cost of inference.
Not only do I not keep up with the tech itself, I don’t even keep up with how to pay for it.
This pricing only really makes sense if the users can predict their usage, if not people that use this heavily are just going to be hamstrung and are going to start rationing their usage.
What if the goal was to draw us away from building our own AI data centers with their cheaper prices then eventually make us pay up for the difference?
from what they wrote, they're just changing how they measure the usage; might even be a good thing if you manage your context right:
> This format replaces average per-message estimates for your plan with a direct mapping between token usage and credits. It is most useful when you want a clearer view of how input, cached input, and output affect credit consumption.
I wish the Chinese would release a model comparable with 5.4 and free me from this pain
which ones have you tried? some are not far off, but it depends on what you do
I've tried z.ai, qwen, deepseek, minmax... they've all been barely half as capable as a middling Codex model.
i would try kimi
Qwen has also been improving recently, in fact most have, so depending on when you last tried them you can try again and see how they work for you
My local Qwen is decent for some things, Kimi is decent for most things and occasionally it has been able to do better than Opus and GPT 5.4 on particular tasks
it will soon be very costly to stay in just one provider
Makes sense. Right now the subscriptions are like Uber as I remember it in NYC in 2014.
Can a "tip your code assistant" button be far behind?
The current pricing model (for plus) feels deliberately confusing to me, I can never really tell if I'm nearing any kind of limit with my account since nothing really seems to tell me.
5h and weekly resets remain, but the quotas are now ‘filled’ differently?
Does this mean there’s no such thing as a “subscription” to ChatGPT for businesses? I thought they offered businesses a subscription with some amount of built in quota previously, including for the side products like codex and sora.
There are still subscriptions that give access to both ChatGPT and Codex, but with a much smaller usage quota than before the change (which came at the same time as the end of the 2x promo). I couldn't find the equivalent in terms of credit for the usage included with these $20/25 seats...
So migrate to gemini now?
If you use Google's tooling but not if you need API access. API access is not in the subscriptions and uses token based pricing. For development I find that the Gemini IDE plugins that have good free usage and are included in the subscriptions aren't great. Gemini plug-in under IntelliJ is often broken, etc. The best experience is with other tools like Cline where you've had to use a developer based account which is API usage based already.
But Gemini's API based usage also has a free tier and if that doesn't work for you (they train on your data) and you've never signed up before you get several hundred dollars in free credits that expire after 90 days. 3 months of free access is a pretty good deal.
wouldn’t it be “usage based pricing” not “pricing based usage”
I'm confused as to how pricing works.
For home projects, I almost exclusively use the web chat interface to code. I haven't done anything large yet so I will iterate and get the web chat to update code, print out the code that I copy and paste.
How does this differ in terms of pricing than Codex?
Token-based usage accounting is more accurate and therefore more sustainable than message-count-based usage accounting. It should've been this way to begin with.
good. just like the Claude model. getting the pricing to be in line with costs is the only way this remains sustainable.
I would prefer if it actually explodes sooner rather than later
> I would prefer if it actually explodes sooner rather than later
The idea, as far as I can tell from all the pro-AI developers, was that it will never explode, and the performance will continue increasing so the slop they write today doesn't need maintenance, because when that time comes around there will be smarter models that can clean it up.
If the providers are tightening the screws now (and they are all doing it at the same time), it tells me that either:
1. They are out of runway and need to run inference at a profit.
or
2. They think that this is as good as it is going to get, so the best time to tighten the screws is right now.
They could also do a plan 3 where they discourage others so they can use it to, say, rapidly build many new products but competitors would have to pay a fortune for the same luxury
Just spitballing.
> They could also do a plan 3 where they discourage others so they can use it to, say, rapidly build many new products but competitors would have to pay a fortune for the same luxury
Unlikely that they all decided to do this within weeks of each other. Still, like you said, you were spit-balling, not asserting :-)