Ask HN: How would you build a budget CPU compute cluster in 2023?

42 points by margorczynski a year ago

So lately I've been dabbling in a lot of stuff that requires a lot of CPU compute, 0 GPU and can be basically linearly scaled across threads and nodes in a cluster.

Now using only my own box is proving to be a bottleneck so I've been thinking of either using AWS spot instances or building my own mini-cluster (2-3 machines + switch) at home. Does it make sense to go cloud (even spot) when I would aim at high utilization?

As for the potential node spec:

- Ryzen 4500/5500 (seems best perf/$)

- Some mATX AM4 mobo with integrated GPU

- 2x8GB RAM

- mATX case, the smaller the better. ITX seems pricier.

All the box does is basically run the k8s pod(s).

WDYT?

phamilton4 a year ago

Honestly, look on ebay for used Lenovo ThinkCentre M/P "Tiny" or Dell Optiplex "Tiny" computers. They can be purchased by the lot for most of them as businesses get rid of the old ones.

I use 4 Lenovo M910x's as a kubernetes cluster and home lab. Have them all connected with a netgear switch. The whole setup costs about the same as a single new quality work station. Each has: i7 8700 (6c - 12t), 32gb memory, 1TB SSDs, <1L case, they're practically silent. easy to find parts, they even use lenovo laptop chargers. if one dies, I can easily purchase + replace in a few days.

You can even go cheaper if you don't need the absolute fastest cpus. Some of these older tiny computers can be purchased for around 100 bucks if you look for them. It has worked like a charm for me. Not sure how much horsepower you really need, but this is a cheap way to build a home cluster. I think they hover around using 40w most the time, so power isn't really too big of a cost either.

  • nosebear a year ago

    >I use 4 Lenovo M910x's as a kubernetes cluster and home lab.

    Just asking out of curiosity: Why not a Ryzen 16-Core CPU + 128GB RAM and then VMs / containers everywhere? To me that seems much less hassle than having to pet four machines, as I am way too lazy to write automation stuff for myself. Also such a setup setup needs less space and probably less power.

    • solomatov a year ago

      >Just asking out of curiosity: Why not a Ryzen 16-Core CPU + 128GB RAM and then VMs / containers everywhere?

      It will be harder to simulate failure there to see how the system behaves in such a case. You could just remove ethernet cable or remove a power cable, and see how it reacts.

  • ghoul2 a year ago

    I have been wanting to do something similar. But there's a snag:

    How do you tell which of these available boxes has Intel AMT (or its AMD equivalent) fully enabled?

    No listing reports this, and sellers seem to not be able to answer the question either.

    How did you figure that out? Or are you managing without remote management? If so, how?

  • sokoloff a year ago

    Worth noting that a lot of the ultra-tiny PCs have laptop CPUs, which aren’t as fast as the equivalent desktop part.

  • jawmes8 a year ago

    Happen to have a write up on this? It sounds really cool. I’m in the process of retiring old machines for a client and I was thinking of doing that.

PragmaticPulp a year ago

> As for the potential node spec:

> - Ryzen 4500/5500 (seems best perf/$)

Price/performance is great for the CPU, but you have to spend hundreds of dollars on the motherboard, RAM, power supply, and case for each one.

You need to look at the overall system cost. If you’re building new, it could be cheaper overall to put 12-core or 16-core CPUs into a smaller number of machines than it would be to put a lot of $100 budget CPUs into many machines.

Unless your goal is to build a cluster for the sake of building a cluster, you might have a better price/performance ratio by building a single 16-core 7950X box than you would with three separate Ryzen 4500 or 5500 machines.

Even with perfect scaling, you would need at least 4 separate Ryzen 5500 machines to have a chance at beating a single 7950X for CPU-bound tasks. The 7950X CPU alone is barely more than 4X the cost of a Ryzen 5500, but you only need to buy one motherboard, one power supply, and so on.

  • DeathArrow a year ago

    > Even with perfect scaling, you would need at least 4 separate Ryzen 5500 machines to have a chance at beating a single 7950X for CPU-bound tasks.

    According to Geekbench [0] you only need 3. [0] https://browser.geekbench.com/processor-benchmarks/

    • PragmaticPulp a year ago

      That’s still 3 separate motherboards, 3 separate cases, 3 separate power supplies, 3 separate sets of RAM, 3 machines idling instead of one, etc.

  • IanCal a year ago

    It's also way easier to use one bigger machine than a few networked ones for a lot of tasks.

3np a year ago

- Cases are overrated. You may not need them. Corkboard is underrated. Just make sure you keep your hardware clean from dust and debris.

- Have you found a mobo? Due to the existence of APUs (I assume), there doesn't really exist many AM4 mobos with iGPU. You have the Asrock RACK boards, which are great (I operate a few) but maybe over budget if you're on a shoestring. You may not need a GPU at all but then you prob want an APU or dGPU on hand for troubleshooting and potentially flashing (not all mobos and firmware versions boot headlessly from what I hear)

- General rule of thumb: if youre going to use something for prod, buy at least 2 of each. It's great to have an extra for experiments and you'll be grateful in case of hardware failure.

- in case you plan to run your host OS straight on the metal (as opposed to VMs): It's recommended to separate your control plane from your workers. Use Pis or similar for this; whatever you can find cheap.

- Rather than HN, I highly recommend you check out ServeTheHome (forums/blog/yt). Lots of great stuff there. The "tinyminimicro" (that would be the small dell/lenovo/HP units other commenters mention) and ali-router-board tracks can be worth considering as well. You should be able to get good ideas about switches here too, maybe even score something on the trade board if you live in US or EU.

- Screw AWS. You should be able to run the money numbers on that yourself.

anonym29 a year ago

Unixsurplus. Just like with cars, the cost savings of buying used are difficult to beat. Who cares if the hardware is a few years old if you're getting it 90% off original MSRP?

As an example, visiting the site now, first thing I see is a box with 2x E5-2667 v2 (8c,16t, 3.3ghz base clock, 22nm). These were $2300 each when new. It also comes with 128 GB RAM, case, PSU, 1U rail kit, and two 500 GB SATA SSD's to partially fill it's 10 caddies.

The entire thing is $260 + $65 shipping. You can't even get 16c/32t of 3.3ghz compute alone for that price these days, let alone a whole bootable system.

This entire system is about 7% of the price of those two CPU's when new, so you're getting at least 93% off MSRP there (in reality, higher, after subtracting the cost of the RAM, case/chassis, PSU, disks, etc).

Sure, 4x R5 5500 does give you a passmark of around 76k compared to the 24k you're going to get with 2 of those xeons, but then again, you couldn't even buy four of those R5 5500 CPUs alone (let alone 4 cases, mobos, PSU's, coolers, HDDs, RAM, etc) for the cost of that system on Unixsurplus.

I am not affiliated with Unixsurplus and don't personally know anyone who is, but man do I love their store. It's the technology hardware implementation of "one person's trash is another person's treasure"

  • Melingo a year ago

    If you need compute there is nothing worse than old hardware.

    It's just not worth it at all.

    Ram is much slower, energy consumption is higher, core count and caches are lower.

    Alone caches can hurt you a lot. I had to upgrade my CPU after I got a new DSLR because the new raw files just killed it and I'm pretty sure it was the cache sizes

  • mgarfias a year ago

    Ive done that before (using used cheap 1u boxes). The problem is listening to them, and then paying the power bill

    • anonym29 a year ago

      That's what the neighbor's outside outlet is for! /s

      Personally I live somewhere that is very cold for most of the year, and the cost of heating is going to be incurred for me whether I use servers, a furnace, or space heaters. The server fans also help block out noise from the freeway right next to my house too, so it's not a horrible trade off for me, but good call out, YMMV from mine.

      • mgarfias a year ago

        the power bill wasnt the worst. it was the constant screeching from the fans. I had them in a closed basement room but the noise penetrated the rest of the house.

        Its bad enough with 12 drives spinning + the fans in the 2u enclosure. Sometimes I want to just throw it all away.

  • josh_p a year ago

    What do you install on one those? Can you link a guide for setting one of these up?

    • anonym29 a year ago

      Probably Debian, but it depends on the workload.

      I'm not sure why you'd need a guide to "setting up" a complete, prebuilt 1U server or what it would even consist of. "Plug in the power cable and press the power button, then use your USB keyboard and USB mouse to get into the BIOS, where you rearrange the boot order to select from your USB flash drive"? Same as any other prebuilt computer.

      Unless I'm misunderstanding the question somehow?

      • aprdm a year ago

        PXE Boot over the network seems like a better way, no ?

  • miketery a year ago

    Any harm in mounting such an example server vertically?

    • anonym29 a year ago

      Probably not? Make sure your hard drive caddies aren't facing down I guess, lol.

ericpauley a year ago

Cloud vs. on prem is going to come down basically to your duration of use and average/peak utilization ratio. For instance, if you want to run large parallel experiments 1 hour a day then 23 hours a day your compute is sitting idle then that will favor cloud more. Fundamentally on-prem you incur capital costs proportional to peak use whereas cloud opex is proportional to average use.

Once you have your intended compute lifecycle figured out you can compute the cloud cost and hardware cost and compare. Given you’re mentioning k8s I’m assuming this might be a continuous load in which case you’d amortize your hardware capital costs much faster.

  • simcop2387 a year ago

    Along with that, if you can find other users inside the company/research area/whatever that need compute power too, splitting that cost and getting higher usage numbers will also probably be favorable compared to cloud offerings then. It'd also let you split up the hardware and maintenance costs of keeping it on premises too.

  • anonym29 a year ago

    Another really important factor here is workload size. If you NEED massive, parallel compute 1 hour a day, public cloud is probably your best option.

    But if you can break that up and batch process it over 24h, and you don't have a need to scale up and down, a VPS from OVH is going to give you almost an order of magnitude lower cost than Azure, AWS, and GCP will for a box running 24h/day. What $5/mo gets you on OVH will run you close to $50/mo running 24/7 in a public cloud.

    • ericpauley a year ago

      Note these are on-demand costs. Spot pricing or savings plans narrow the gap to more like 3-4x cost difference.

tyingq a year ago

What are you trying to optimize for? Solely the initial cost of hardware? Power consumption? Is there some performance target, task that needs to complete in X minutes, etc?

For example, I might suggest buying used Lenovo Tiny M75q's on eBay. The Ryzen 3400GE is significantly slower than your Ryzen 4500, but also lower TDP and very cheap procurement cost. Also fits your "smaller the better" wish. No ECC, though.

h2odragon a year ago

How long is this gonna be up and running? Cases may not be needed. If you're optimizing for cheap; then perhaps motherbaords and CPUs of a couple generations back are a better bet. "Refurb" deals can save a lot of money there.

Of course "cheap" can cost too much: if you need reliability and want it to run first time after assembly, then it might pay to spend more.

  • jsjohnst a year ago

    One problem to keep in mind with refurb though is power consumption. Ask me how I know (hint, there’s a rack of Dell r720’s in my basement).

  • margorczynski a year ago

    I aim mainly for perf/cost and the cost is the TOC - this includes power consumption which currently is a big part of the cost in the long run. So I would aim for lower TDP under load - these newer Ryzen's seem to have 2x lower TDP than quite less performant older solutions.

henrixh a year ago

There are a few more things to consider:

- Do you need shared storage? If so, how fast? Read or write heavy? - Do you need performant interconnect? (for say MPI? Used IB-cards are cheap on ebay) - Is your software limited by memory bandwidth? (If so, aybe go with more memory lanes than 2)

I'd rent a few different configurations from Hetzner to benchmark before buying.

If you don't need more than a few nodes and you are not limited on memory bandwidth, you could consider a single, faster, node. But the sweetspot is probably consumer-grade Ryzen.

As for cloud, as long as you know you'll actually use everything you buy for a long enough time period, buying will be cheaper.

DeathArrow a year ago

Get as much cores as you can, use dual socket boards. You can find 12 core Xeon E5-2670V3 for about $12 on Aliexpress and dual socket X79 boards for about $75. Used ECC RAM is cheap.

Build your own boxes.

You can use Kerrighed or OpenSSI for the software side.

  • margorczynski a year ago

    Wow they really cost peanuts. A bit high TDP but still...

    Why is this so cheap? I always did software so my hardware knowledge is pretty basic but still this kind of chip can really offer a lot in terms of performance even now?

    • kube-system a year ago

      Perf/watt is bad

      Perf/$ is only decent because they’re dirt cheap on the used market as datacenters are dumping them

      • margorczynski a year ago

        Yeah from what I see the perf single thread is 2x lower than the Ryzen 5500 so even with 2x the cores you get more or less the same. Just that the Ryzen has 2x lower TDP.

        Still, considering it is like 4-5x times cheaper and uses a mobo with two CPUs (which saves on cost cause you don't need to build 2 boxes) this might be a good idea that I'll strongly look into. Thanks!

      • formerly_proven a year ago

        Well all the Xeon Ex-xxxx vY stuff is more or less ten years old by now.

    • PragmaticPulp a year ago

      They’re a good value for playing around, but CPU performance has come a long way since those were released. A mid-range AMD desktop CPU could beat even a dual-socket server with two of those in most tasks, and a used 16-core AMD desktop chip will walk away from it in almost every benchmark. It will also be much quieter and consume less power while doing it.

      If you’re on a shoestring budget, don’t mind hunting down and testing used parts, and power and noise aren’t an issue then those old server parts are great fun though.

    • anonym29 a year ago

      Also don't forget no hardware spectre/meltdown mitigations. V2 Xeons are EOL and get no more updates or support, V3 isn't far behind. Once these go EOL (or get close), companies start dumping them.

    • DeathArrow a year ago

      Large companies and data centers upgrade their hardware so there's a huge amount of hardware that either gets recycled or sold to hobbysts.

cturner a year ago

Graphics. You may not need to have a graphics card in nodes once they are installed. So you may find you can get a single low end card and use that to install each host.

Power and heat. Will you have enough power for the nodes? What is the power trade-off if you get low-end chips vs higher-end chips? Have a look at the Ryzen page on wikipedia to get a feel for power use of each chip. How will you understand how much cooling you need? (more cooling takes more power)

RAM. How much does accuracy matter? Should you use ECC RAM? You can get UDIMMs to work in Ryzen kit, but not with the chips with integrated graphics card (i.e. avoid APU chips if you want ECC). Get Asrock or Asus AM4 motherboards, then get RAM like this - Samsung M393A4K40DB3-CWE. If you go cloud, you may find the hardware has ECC.

IO. Once the grid-of-nodes is in place will you be moving data to functions, or functions to data? How much data are you moving over the network per-job? Might there be IO bottlenecks when you scale up? How detailed a model of IO can you build before you settle on hardware?

t0mas88 a year ago

AWS / GCP / Azure don't make a lot of sense if you're looking for cheap compute and don't need the rest of the cloud.

Take a look at dedicated servers at Hetzner. They're very cheap, have enough bandwidth to transfer the things you calculate into and out of there (at no extra cost, unlike the three big cloud providers), and come with some serious CPU power if you pick the right model.

You can even email their support staff to get you a couple of machines in the same rack so you get fast network between them.

And contracts are month to month so at the end of the project you can easily cancel.

Edit: but do note that these are consumer type machines, no dual power supply, no ECC etc. That's why the cost is low. Threat them like a bit more durable version of spot instances but definitely not datacenter level stuff.

pella a year ago

> Does it make sense to go cloud (even spot) when I would aim at high utilization?

Hybrid cloud ?

"combines and unifies public cloud, private cloud and on-premises infrastructure to create a single, flexible, cost-optimal IT infrastructure."

Hetzner has a dedicated cheap server: ( monthly pricing )

https://www.hetzner.com/de/dedicated-rootserver/matrix-ax

- AMD Ryzen™ 5 3600 ( € 37.30 + VAT) / month

- AMD Ryzen™ 7 7700 ( € 59.00 + VAT ) / month + setup

- AMD Ryzen™ 9 5950X ( € 103.30 + VAT ) / month

- AMD EPYC™ 7502P ( € 119.80 + VAT ) / month + setup

jhot a year ago

I just upgraded compute hardware at home and went with: ASRock Deskmeet X300 (itx case, motherboard, and 500w power supply), Ryzen 5600g, 32 gb ddr4, 512 gb m.2 for around $400. Pretty good bang for the buck.

ClumsyPilot a year ago

As others have said -this is vague.

For my home server, I pick the smallest case that can fit a desktop CPU, so just a bit bigger than Intel Nuc. Those have laptop CPU's, you are overpaying. I am willing to pay extra for it to be small.

The two best contenders for me are Asrock DeskMini barebones system (picoITX) and IN WIN Chopin case - you gave to buy an ITX motherboard.

I use Chopin with an Intel CPU, they work for my usecase.

Also some motherboars can boot a ryzen without any GPU at all. Asrock usually will. If you are willing to deal with a totally headless system, go for it.

dale_glass a year ago

I'd go with a rack and rack cases, if you can. There's a reason why the industry uses it.

I find that once you have a bunch of equipment piled up it makes a huge hard to manage mess, and that happens a lot faster than you'd expect. Before finally getting a rack I ended up with with a bunch of hardware caked in dust because it was all lying in such a precarious way that I was afraid to touch anything in there.

h0bb3z a year ago

Have you checked out the resources in https://www.reddit.com/r/homelabsales/

There are usually some good deals on used gear and things suitable for selfhosting if you want to go that route. I was able to build a 3 node cluster with lots of CPU/RAM (~100 vCPU/256G RAM) and storage (30+TB) on systems with redundant power supplies made for datacenters for under $500.

Upside: one time cost and usually cheaper than cloud-hosting costs.

Downside: power consumption (energy bill) increase unless you go with something like a Pi cluster, and you need to setup security well if you intend to expose any services to the Internet.

tw1984 a year ago

If 2-3 Ryzen machines is all you need, you should just go for a Xeon/Epyc server. You can build a decent Epyc workstation/server with 64 cores for as little as $2k USD when using 2nd gen Epyc processors.

thinkmassive a year ago

I have a couple similar machines in my homelab, running in ASRock X300 boxes w/Ryzen APUs (cpu+gpu), in a similar role (k3s).

The reason I have two is I started with a 3400g (4c/8t) due to supply limitations, then upgraded to a 4750g (8c/16t) when it became feasible. Over time I upgraded memory and storage, so eventually I had everything but the case for a second “half-power” machine.

Having multiple medium-power machines can be useful for rolling upgrades (and for learning purposes), but otherwise it’s very uneconomical.

If your goal is to maximize cores/$ then a single beefy machine will do best.

  • atomicnumber3 a year ago

    The server of Theseus! Did you make a new hostname for the server made of the old parts or did you give it back it's old hostname and rename the new-parts one?

    • thinkmassive a year ago

      It inherited the name of another machine, from where its primary storage was pulled (despite an OS replacement).

      You make such a great point that I’m going to rename it: paradox

      Thank you, Lithium!

dcchambers a year ago

If your workload is able to be split up and run efficiently on different machines, just buy as many cheap (or free) 1-to-10 year old used computers as you can, and run HTCondor on them for scheduling the jobs.

psyklic a year ago

I built my own CPU cluster with mATX AM3s connected via a switch, powerful processor/RAM and everything else was as cheap as possible. At the time, it easily saved money compared to AWS.

The biggest issue I had was overheating. The small mATX cases don't fit fans sufficient to cool powerful CPUs running at 100% 24/7. So, you may have to get midsize cases, or leave the cases open with fans sticking out, which is louder.

shiftpgdn a year ago

You can buy complete Dell Workstations with e5 v4/v3 CPUs or even Xeon Gold CPUs for a pittance on ebay. They already have power management, etc built in and tend to run whisper quiet.

oh-my-god a year ago

Have you think on mac minis? Compact, cheap and super powerfull

I use AWS c7g (64x arm graviton3 cpus)instances to run some simulations. They are the fastest instances for our work.

If I had to run simulations daily it will be cheaper to have 8x mac minis M1 at a buying cost of around 5200 Eur.

  • oh-my-god a year ago

    I reply to myself. The cost of the 8 M1 mac mini is less than that. You can probably find second hand for less than $500 each = $4000 for the complete cluster 64 cores total.

SXX a year ago

This will heavily depend on power costs in your area, but if your tasks are not just CPU heavy but also require a lot of RAM it's totally worth to consider old Xeons from China with used Samsung RAM.

This used hardware can be easily be 2-4 times cheaper than building using modern CPUs, but power usage is also much higher.

  • laweijfmvo a year ago

    This. I wouof look at total cost (including power) for a year of the amount of compute you need. I suspect starting with a single modern massive core Epyc will start to look more attractive long-term than 10s of old Xeon boxes

Consumer8735 a year ago

It may not be the best use case for you. But you can use the Oracle Cloud forever free tier to run a basic Kubernetes setup. If you don't want the limitations on the networking then you can always pay extra for another public IP.

so-and-so a year ago

Mobos these days don't come with iGPUs. You'll need CPUs with graphics or buy cheap graphic adapters on eBay.

You can buy a couple of 16 to 22 core Xeons on AliExpress and a dual CPU motherboard for them. Plenty of reviews on YouTube.

  • marginalia_nu a year ago

    > You'll need CPUs with graphics or buy cheap graphic adapters on eBay.

    I'd recommend something like a GT 700-series card for this purpose. I've got one I stick into my server when I need it to have a screen. Costs like 50 bucks.

Gordonjcp a year ago

Thousands and thousands of RP2040s at 50p or so a pop, configured as a massive transputer.

It wouldn't be all that fast, really, but it sure would be elegant.

postalrat a year ago

I would investigate buying used servers, workstations, desktops. Often you can find them for very cheap and in large quantities.

jononor a year ago

How much data do you need to move in/out? How much variation is there in demand?

nubinetwork a year ago

If you want high throughput, you may want to consider 10gbe and nvme.

nspattak a year ago

cpu cost is only a small fraction of overall cost: for every cpu, there is a power supply, ram, motherboard and case. You also haven't mentioned anything about data: do they need local storage or will it be only temporarily stored locally?

it sound like minipcs can be an excellent solution for you.

For how long will you be using this? AWS may be preferable in the short term while local hardware may be cheaper in the long term/a lot of cpu hours.

there is also the question of your application performance on different cpus. there are older servers available for very cheap prices but is it worth it to buy a 12/20core xeon cpu that consumers 200-300W if its performance is similar to a 5900 at 150W ?

PaywallBuster a year ago

or hetzner

beefy dedicated servers for 50/100 eur per month

you can use it for a few months and return it any time monthly contract)