points by bastawhiz 11 hours ago

This isn't a good analysis, and it's because it keeps rounding everything up. He rounds up the cost of electricity by 10%. He has a range of power use, takes the high end (which is 2x the low end) and multiplies it by the inflated electricity cost.

But then they talk about using a newly purchased Mac to do the inference, running at full capacity, 24/7. Why would you do that? Apple silicon is fast but the author points out: you're only getting 10-40 tokens per second. It's not bad, but it's not meant for this!

It's comparing apples to oranges. Yeah, data centers don't pay residential electricity rates. Data centers use chips that are power efficient. Data centers use chips that aren't designed to be a Mac.

Apple silicon works out pretty good if you're not burning tokens 24/7/365 and you're not buying hardware specifically to do it. I use my Mac Studio a few times a week for things that I need it for, but I can run ollama on it over the tailnet "for free". The economics work when I'm not trying to make my Mac Studio behave like a H100 cluster with liquid cooling. Which should come as no surprise to anyone: more tokens per watt on hardware that's multi tenant with cheap electricity will pretty much always win.

datadrivenangel 10 hours ago

Rounding everything down in the most optimistic setting got me to $0.40 per million tokens, and openrouter has the same model at $.38/mtok.

  • 650REDHAIR 10 hours ago

    I’ll keep my data local over a $.02/mtok difference.

    • quietsegfault 10 hours ago

      It’s more than just data locality. OpenRouter is faster, no? I have an M4 pro, and anything but the smallest dumbest models are unusably slow for interactive use. I personally haven’t yet found a good use case for offline/non-interactive LLM work locally.

      • datadrivenangel 9 hours ago

        Yeah. The speed is the biggest issue. The intelligence of open models is good enough for serious work (though still worse than the frontier models), but the cloud models are often 3-7 times faster, and you can get more parallelization and so get speeds on the order of hundreds of tokens per second, which makes things fast!

        • freeopinion 7 hours ago

          Even extremely slow LLMs can generate Part B faster than I can audit Part A. So the LLM can generate Part A while I look over my email. Then it can worry over Part B while I look over Part A.

          It can worry over Part C while I have my 10:30 group meet. And it can worry over Part D while I do whatever other silly, time-wasting thing all humans do in almost all organizations. Then I still haven't reviewed Part B, yet, so the extremely slow AI is waiting on me.

          Maybe someday I'll be good enough to need faster AI so I can rewrite something like Bun in a few days. Right now, slow and local fits my use case very well.

          • quietsegfault 6 hours ago

            I don’t think it matters if you’re “good enough” or not. Much of AI development is iterative. If you context switch between A from project 1 to B from project 2 back to check A, then maybe C while B finishes up, you will lose the flow state that AI assistance can enable with speed for those who are not fluent coders.

            Sure, I can wait hours for my local model to finish, or I can spend basically as much and get the answer right away

            There’s a lot of exciting stuff with local LLMs despite the speed, but for me I don’t have the discipline and working memory to jump from project to project.

      • threatofrain 7 hours ago

        And continuing the argument of "more than just...", if you stopped inferencing on your Mac you still have a generally nice computer. The difference between rent vs buy.

      • novok 5 hours ago

        I played with classifying and summarizing my entire email history (per email) with small models, but that only took about 12h of GPU time at most. Using a coding agent cli wrapper in that case is far slower because of all the spin up cost and the system prompt they inject even if you want to turn it all off.

        If I used an actual direct API it probably would've been much faster, but I'm doing it for hobby / fun reasons. You also get to fiddle with a lot more params.

      • PAndreew 5 hours ago

        I’m running a local Whisper + Gemma 4 pipeline with a cheap USB mic to extract health related data and potential todos from ambient speech. It doesn’t have to be fast doesn’t have to be 100% correct because if it captures at least a few bits of interesting information that would otherwise go unnoticed it’s still a win.

        • 650REDHAIR 2 hours ago

          I run whisper through openwebui to gemma4 moe and use kokoro TTS back to me.

          I use a 5060ti 16gb and a minipc.

          I tunnel in via Tailscale and access it with my phone or laptop from anywhere. It’s pretty good and will only get better as I optimize.

  • formerly_proven 8 hours ago

    What is it with AI SaaS naming themselves "openxyz" when there is 0% open about them?

    • em500 8 hours ago

      They learnt from ooenai that naming yourself open-xyz doesn't actually require opening anything.

    • debugnik 7 hours ago

      It's the next co-opted buzzword after "democratize".

  • nativeit 7 hours ago

    But once all that is done you still own a Mac in one case, and you don’t in the other, correct?

    • odo1242 7 hours ago

      Yea this; it’s the same reason why mortgaging is cheaper than renting

      • ericpauley 7 hours ago

        This is far from a universal truth: https://www.nytimes.com/interactive/2024/upshot/buy-rent-cal...

        Real estate is only a clearly good investment if you ignore opportunity cost.

        • seanmcdirmid 7 hours ago

          You also need to pay close attention to rent vs purchase ratios. A lot of cities are cheap to rent but expensive to buy (eg beijing 10 years ago).

          • mantas 6 hours ago

            Key word being „ago“.

            • deaux 6 hours ago

              Such cities still exist and have been in such a state for decades. They can change but that's meaningless as they can also change the other way around.

            • seanmcdirmid 4 hours ago

              I’m covering my bases because Chinese real estate has been volatile recently and I’m not sure where the market is at now. It could be that renting is still way cheaper than buying, I just don’t have any direct experience to back that up. If I bought while I was living in Beijing I would probably be underwater with my investment right now, renting for 9 years was the right call and my rent was pretty affordable anyways.

        • sgt 7 hours ago

          Articles like that still miss a bit of the nuance. Imagine having your house paid for, and you grow old and you have no rent to pay. Yes, you could have invested but likely you would have spent some of that money on something else, or your investments might have not worked out so well, or any other reason. Human reasons, to be specific. Owning property is like a lock.

          • orangecat 6 hours ago

            Imagine having your house paid for, and you grow old and you have no rent to pay.

            My home is "paid for". Except for the HOA and property taxes that are not that far off from what I was previously paying in rent, the ongoing maintenance costs with random large spikes, and the opportunity cost of having a large chunk of money in the house and not in the market. It was still probably the right decision, but it's not at all a free lunch.

            • sgt 5 hours ago

              Surely though, the HOA and all that would likely be baked into a renter's price.

              And you didn't need to go live in a HOA. I don't, and it's much cheaper.

              • orangecat 4 hours ago

                Surely though, the HOA and all that would likely be baked into a renter's price.

                Sure, the same way that the benefits of a fixed mortgage payment are baked into sale prices. The efficient market hypothesis would say that neither renting nor buying should be obviously superior in the long term, because if either was then people would bid up rents/prices until it wasn't.

                And you didn't need to go live in a HOA

                I pretty much did, unless I wanted to significantly compromise on other factors.

                • PunchyHamster 3 hours ago

                  > The efficient market hypothesis would say that neither renting nor buying should be obviously superior in the long term, because if either was then people would bid up rents/prices until it wasn't.

                  Buying have much higher entry point, need a bunch of cash at start then a ton of paperwork.

                  It is absolutely possible that local buying market is inflated precisely because the area is so desirable buying to rent is (or was) good investment, but that's rarely is true for a bigger market

            • ffsm8 5 hours ago

              And it's gonna be interesting wherever this narrative will shift over the next 5 yrs

              I keep hearing that properties are in the biggest bubble yet in the USA - with the affordable housing shortage being a red herring, because real estate managers and boomers are unwilling/unable to reduce their prices - despite not getting renters/buyers because it would kick off a death spiral as their interests would consequently go up (because of lower security). Along with the ai layoffs etc

              I'm not American so I only hear the occasional interview so don't have any idea if it's really as pressing as these industry professionals keep saying but I'm definitely at the edge of my seat watching...

        • hadlock 6 hours ago

          It never fails, there's always someone who trots this thing out. We had bought our house, and then had to move and decided to rent. I was APPALLED that they wanted me to fill out an APPLICATION form, where they would decide my worth, and let me know if we would be allowed to live there. When buying a house, my cash was as good as anyone elses'. And then the management company would come inside my house to inspect that I wasn't running a meth lab or something. Thankfully that only lasted two years. I will never rent again. Majority owner-occupied neighborhoods have different characteristics as well.

          • loeg 6 hours ago

            > I was APPALLED that they wanted me to fill out an APPLICATION form, where they would decide my worth, and let me know if we would be allowed to live there. When buying a house, my cash was as good as anyone elses'.

            House sellers receive offers from buyers, sometimes including letters, and can choose to sell to any of them (or none of them), whether or not those offers are higher than the listed price. It's not so different.

            > And then the management company would come inside my house to inspect that I wasn't running a meth lab or something.

            Yeah that part is different. I also prefer owning.

        • PunchyHamster 3 hours ago

          It is very close to universal truth, aside some small areas with very warped market.

          Even if you move out after 5 years, you still own the place and can rent it out and then it pays for itself, to skip the cost of selling it back to market

        • shaewest 3 hours ago

          Real estate is generally a "good" investment as it's considered a relatively safe way to get significant leverage. 5x leverage in the case of a 20% deposit, or even up to 20x leverage with countries that allow for 5% deposits (New Zealand).

          In addition, the interest payments almost always end up being near the rent the owner would have paid, so mortgage payments are higher, but that increase is generally (and quickly becomes) principal while being able to counteract inflation of rent.

      • BoorishBears 6 hours ago

        Except one day the hype will catch up to reality that was always true, people will realize their $20,000 Mac is has less utility as a "way to learn AI" than some kids 3090 fortnite machine, and it'll be back to below MSRP.

    • teekert 6 hours ago

      Plus your privacy.

    • stusmall 3 hours ago

      Not always. The calculations take its useful life expectancy as an input. If they estimate it correctly you have highly likelihood of it breaking, burning out or being woefully out of date by the end. At the 10 year window you are looking at losing support for security updates.

      So if you are lucky you might end up with something that still runs but most folks won't find it particularly useful

    • jmalicki 1 hour ago

      Even at just the electricity cost openrouter will be both

      1) Roughly break-even to a little bit cheaper per token cost 2) Much, much, faster

      So the cost of the mac barely even matters, it's just an extra cost beyond.

      Sure, data center providers can pay lower rates.

      The point of this article is that LLMs at home really don't make a ton of sense, unless you are willing to pay through the nose for privacy. There is absolutely no cost saving to be had.

      If you're looking at your own datacenter as a larger corporate client, that could change.

      There are also some providers that will contractually keep your data private, like AWS Bedrock or parts of Google/Azure (I don't know their stack names).

      AWS even has AWS Secret Region and AWS Top Secret Region if you want to use LLMs on classified data.

      You have to value privacy at a roughly absurd level to not want to use LLMs run efficiently at scale by someone else. For the home user, just the extra efficiency produced by batching requests from a large number of users in a datacenter in a real win.

      Some of these companies are even selling tokens below cost to get marketshare. If someone will sell you a service for a dollar bill or three quarters, why wouldn't you take the three quarters?

  • novok 5 hours ago

    Also many have power even cheaper or even free unused surplus power with solar.

    I don't do local inference other than hobby & learning reasons because electricity is so expensive where I am at.

ikidd 33 minutes ago

Actually, figuring it on generating tokens 24/7 is the best case scenario. if you figure it at 8 hours a day of actual use, you still have the fixed cost of the hardware being the highest portion of the budget, but now you generate 1/3 the tokens so you triple that cost per token.

avidphantasm 59 minutes ago

Not sure where 40 tokens per second is coming from. I’ve seen 95-100 tokens per second on M5 Max 128GB running Gemma 4 31B. I’ve done experiments where it is faster than Claude Opus 4.5 for the same prompts.

faitswulff 10 hours ago

The article makes no sense. I can't use OpenRouter as a general purpose computing device. Why are we comparing a whole computer to a single purpose SaaS?

  • tuwtuwtuwtuw 10 hours ago

    I think it's because there are a lot of people writing articles about the benefits of running local models. I think it's fair to say that there are daily threads on HN singing the praises or local inference. I also see people buying new hardware where the main trigger is ability to run local models.

    • FuckButtons 9 hours ago

      But the people who want to do local inference are putting some amount of value on privacy that’s not captured by the raw monetary value so just comparing the price is somewhat beside the point, it’s also true that, if you have eg a Mac and you use that as your main computing device then you would have spent money on it anyway, so you can’t even really compare its value to spend on something that’s not general purpose.

      • datadrivenangel 8 hours ago

        My overall opinion is that the smart thing is not to upgrade to the maximum memory for AI purposes. It's worth quantifying how much extra we pay for privacy.

      • tuwtuwtuwtuw 8 hours ago

        I replied to a comment asking why the article exists.

        As for privacy, I'm sure there are many people that are not so interested in that aspect.

      • apf6 7 hours ago

        That's a lot of assumptions. I think there are also people buying new hardware specifically for this purpose, and their motivation to do it is thinking it will be cheaper in the long run. Privacy is not necessarily the motivation.

  • mpyne 9 hours ago

    They're responding to the people doing things like buying the most expensive Mac they can find specifically to do local inference for their AI agents.

    Some do it to have control over their ability to use AI. Some do it because they think it will be cheaper to not have to pay a SaaS to generate tokens for them.

    But for those interested in the latter case, it seems like it's not actually cheaper after all, at least at current prices. But then I don't expect prices to drastically jump because of how much competition there is in model development.

    • datadrivenangel 8 hours ago

      It's worth paying a premium for the privacy (assuming that llama.cpp and ollama aren't sending my sessions back to the cloud regardless...), and for the concerns about not getting a surprise bill.

      • nomel 4 hours ago

        > not getting a surprise bill.

        Correct me if I'm wrong, but I believe this is a feature that only Google has figured out how to implement. All of the other pay-as-you-go token services have a cap you can set, some by monthly spending, some with API key resolution, others by how much you put into the account. I use many, and if configured with auto-purchase disabled, it's not possible to have a "surprise" bill (except for Google!)

    • dcrazy 7 hours ago

      You also have control over your costs. It is reasonable to assume that tokens will cost significantly more in the near to medium future as the market consolidates and subsidies decline.

  • sheepscreek 8 hours ago

    No, that’s not the point. I think this is to help people who are thinking about getting a beefier Mac so they can run their LLMs on it too. Some in particular want a dedicated Mac Mini or Studio for this purpose. The breakdown, even if slightly flawed, offers a good insight into the economics of it.

    For most people, they might be better off with OpenRouter models and providers supporting Zero Data Retention. On the cloud, that’s as good as it gets for privacy - your data is never retained beyond the life of the request.

statestreet123 9 hours ago

Rounded up, yes, and oddly inefficient for someone obsessed with inefficiency. One could buy a brand new 64gb M5 macbook for well over 4k. Another could buy a scratched up but functioning M1 Max 64gb off of ebay for a little over 1k—and somehow get the same 10-20 t/s with 31b that the author does with an M5. Or better yet, have a frontier model do the planning and judging, and have a local MOE model execute at 50 t/s. All of this achievable by a former English major with too much free time.

  • novok 5 hours ago

    I have an M1 Pro, and a M4 & M5 max to play with at work and the speed difference is very significant between all 3 machines, the M1 Pro is far slower, and the M5 is significantly faster than the M4. And a windows 3090 beats all of them but eats twice the amount of power per token. This is all running the same 24GB memory friendly model with LM studio.

giancarlostoro 4 hours ago

Honestly, I don't even see my Macbook Pro costing me anywhere near as much as using any of these AI services, but maybe I'm just not seeing a significant increase in my power bill to notice? I am the power user who uses Claude Max pretty much all the time to prototype ideas, and build things I actually use, and has given me a lot of value, I work full time and have a family to raise and care for, my free coding time is mostly limited to ideas. Now I can draft a plan with detail, review the code, run the code, test it, and use software custom tailored to my needs.

make3 1 hour ago

The real reason this comparison makes no sense is that only a vanishingly small fraction of people seriously using ai to code would seriously use a model so far from the top models (including open source ones).

He should compare his MacBook to Open Router on Kimi 2.6 1.1T or GLM 5.1 (754B), at bfloat16 precision, which he can't ofc.

But it furthers his point that things like open router are a better idea, which is not surprising.

dist-epoch 10 hours ago

using it 24/7 brings the average cost down, not up.

the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use

  • groundzeros2015 10 hours ago

    The hardware has multiple uses for the same cost. The pay-per-use server does not.

    • bastawhiz 3 hours ago

      The author isn't pricing in the multiple uses. You either compare it apples to apples or you don't. If you're using the machine for general purpose computing on top of inference then the amortized hardware costs are pointless to measure. This is exactly what I said.

  • bastawhiz 9 hours ago

    That's the point: why would you buy a device that's specifically not optimized to be used for 24/7 inference? It's expensive hardware that's not designed to be used in that situation! The power use for inference isn't especially good and you're not getting even a fraction of the benefit from the hardware that you're paying for.

    • dist-epoch 7 hours ago

      > why would you buy a device that's specifically not optimized to be used for 24/7 inference

      because it costs $1k-$2k instead of $10k-30k+ for optimized devices

      • bastawhiz 3 hours ago

        Nobody is suggesting you buy a pair of A100s, which is what 15k gets you these days. Get a used 5090. And the author specifically priced the hardware at over 4k, which is double the 1-2k you're noting

    • apf6 7 hours ago

      Good question but people are doing it anyway. It's a fact that right now tons of people are buying Mac Minis specifically for this use case, to treat them as their personal data center for agents. The concept of "power use for inference" is foreign. Those people are the ones that motivated this blog post I think.

PunchyHamster 3 hours ago

> Yeah, data centers don't pay residential electricity rates.

There are 2 caveats here:

Some places have higher prices for industrial than residential power as residential one might be subsidied by govt.

And DC also pay for cooling, which residential will only effectively pay if they have AC and is hot outside. So power rates are some multiply of industrial pricing.

  • bastawhiz 3 hours ago

    Generally you don't build a data center in a place that doesn't sell you electricity for cheap

llm_nerd 9 hours ago

Your post makes sense if you bought the hardware for other reasons, and maybe run models occasionally as a novelty.

That isn't the case for many, though, and there is a whole social media space where people are hyping up the latest homebrew options for running models, believing it frees them from the yoke of big AI.

Millions of people are buying big $ maxed-out hardware like the Mac Studios or DGX specifically to run LLMs. Someone rationally running the numbers is a good thing.

  • atq2119 7 hours ago

    Let's not get ahead of ourselves. Millions, really? I can believe there are a lot of enthusiasts doing this, but "millions" needs a citation.

    • filleduchaos 4 hours ago

      This is HN; it has probably never occurred to half the people here that the average person even in first world countries doesn't even have the financial capacity to make an impulse five-figure USD purchase, even if on credit.

      • llm_nerd 1 hour ago

        No one said anything about this being an "impulse" purchase -- it would usually be perceived as a "career investment" purchase, given that many feel they need to race to be with "it" or be left behind -- nor does "five figure" have anything to do with this -- DGX options are available at $3000 -- there's a certain irony that you are posting a comment that is basically "people in my country/circle can't, therefore no one can", while dismissing my comment that many people can.

        Millions of people are paying thousands of dollars a year to buy a slightly upgraded entertainment package in their car. There are 60 million or so millionaires alone, including 6 million+ in China.

        There are a lot of people with a lot of wealth on the planet. A lot. Millions...it isn't that unfounded, friend.

        So doing this "this is HN" snide jerk act, and then basically projecting your lot on the planet is...I don't know if you intended it, but it's rather amazing.

  • curt15 3 hours ago

    > Millions of people are buying big $ maxed-out hardware like the Mac Studios or DGX specifically to run LLMs.

    What's your source for this?

    • llm_nerd 1 hour ago

      Not just one, but two replies completely hung up on that. Like, why even reply when you already saw the other guy doing the same tired pedantry? Just wanted to feel like you contributed?

      Ignoring that it was just tosser hyperbole (that absolutely zero reasonable people need to question), yes, enormous numbers of people are buying GPUs or hardware with the explicit goal of running local LLMs, and social media is full of people hyping various setups and models. Mac Minis are almost impossible to find, and that alone is selling at a clip of about 300,000 every four months. Large memory GPUs are basically a myth at this point. All so people can pay more to get a worse result than commercial options, which is precisely the point of the submission.

      These local setups only ever make sense if you have something that confidential, or you're doing something that ToS of the majors would ban you for.

      Now given this pedantry horseshit, you'll probably demand that I specifically show a citation on DGX or Studio sales, which...rofl.

cyanydeez 10 hours ago

nothing about the current data center craze looks efficient.

  • bastawhiz 9 hours ago

    Whether you think building data centers or not is a good idea it's inarguable that the per-token efficiency (power, hardware, etc) is FAR higher in a data center. That's literally what it's designed for.

    • cyanydeez 7 hours ago

      im talking per value. look at the efgiency of chinese open source models; then look at SOTA sucking gigawatts, then the proposals.

      America is basically proposing AI using the equivalent bloatware of Windows 11.

      • bastawhiz 3 hours ago

        I run two 49B parameter models on a pair of used A100s full time and it sucks down 250 watts at peak utilization. That's not gigawatts, and it's completely within the realm of comparison to what the author is describing.

  • trollbridge 9 hours ago

    Probably because lots of data centres are being built (or half-built) which are sitting idle.

    • mpyne 8 hours ago

      If there are datacenters sitting idle right now then you could probably make a lot of money selling that capacity to Anthropic at this point...

    • bastawhiz 3 hours ago

      If you have racks of idle H100s, you are doing a terrible job of running a business.