spiritplumber 2 years ago

I helped write parallelknoppix when I was an undergrad - our university's 2nd cluster ended up being a bunch of laptops with broken displays running it. Took me a whole summer.

Then the next semester I am denied the ability to take a parallel computing class because it was for graduate students only and the prof. would not accept a waiver even though the class was being taught on the cluster me and a buddy built.

That I still had root on.

So I added a script that would renice the prof.'s jobs to be as slow as possible.

BOFH moment :)

  • sidewndr46 2 years ago

    The school I went to had similar but more insane policies

    * I frequently took computer science graduate courses and received only undergrad. credit because they could not offer the undergrad course

    * Other majors were default prohibited from taking computer science courses under the guise of a shortage of places in classes. Even when those majors required a computer science course to graduate

    I would like to point out that 300 and 400 level courses in the CS program usually had no more than 8 students. I distinctly remember meeting in a closet for one of my classes, because we had so few students they couldn't justify giving us a classroom.

    Contrast that with the math department where I wanted to take some courses in parallel rather than serial. After a short conversation with the professor he said "ok sure, seems alright to me".

    • jollyllama 2 years ago

          Other majors were default prohibited from taking computer science courses under the guise of a shortage of places in classes. Even when those majors required a computer science course to graduate
      
      I went to an institution that did the opposite; seats were reserved for non-cs majors despite a shortage of sections. This resulted in CS undergrads waiting for courses just so they could graduate. It was frustrating because it felt like the department was taking care of outsiders over its own.
      • andrewf 2 years ago

        I might be jumping to conclusions, but orgs which value revenue often prioritize new customers over existing (perhaps captive) customers.

    • brightball 2 years ago

      Part of that is because “average class size” is used in metrics for the university ranking systems and every university out there wants to game those rankings.

    • bionsystem 2 years ago

      Why do you guys think such things happen ?

      • pitterpatter 2 years ago

        My guess is they wanted you to pay the higher CS tuition to take the CS classes.

        Math departments also tend to be a lot more lax in my experience. Case in point: I got sign-off to take a 300-level pure math class without its 300-level prereq. As well as just replace one required course for another to graduate.

      • whatshisface 2 years ago

        Students aren't the business most universities think they are in.

      • KRAKRISMOTT 2 years ago

        Academia being inelastic and refusing to adapt in the face of market forces.

  • anonymousDan 2 years ago

    Parallel knoppix sounds cool. Did the os's on each machine coordinate in any way at the kernel level? Or was it all user level libs/apps/services?

  • throwaway1777 2 years ago

    Prof might’ve done you a favor. Seems like you didn’t need that class anyway.

  • eecc 2 years ago

    A “Snoop on to them, as they snoop on to us” moment. Good for you

  • havnagiggle 2 years ago

    That was nice of you!

    • yjftsjthsd-h 2 years ago

      One might even say that it had a very high nice value;)

  • lynx23 2 years ago

    Abusing admin priviledges like this confirms that it was probably a good idea to deny you access to the course, you were obviously too young.

    • archontes 2 years ago

      Interesting. It's likely the point of constraining the graduate course to grad students had nothing to do with their maturity. But you see what you think is a sign of immaturity, and turn the constraint into a maturity filter.

      I'm quite grown, and I wonder about the ownership/control of the cluster and why he didn't simply lock the professor out entirely, contingent on the approval of his waiver.

      If anything, doing something as small as lowering the priority of his jobs instead of brazenly stonewalling him might be the sign of immaturity.

      • spiritplumber 2 years ago

        The computers were mostly the school's, the carpentry was ours (mine and a friend's), the cabling and network switches were ours (scavenged, eventually they bought a nice big switch), the labor was ours.

        It's not really much of a prank war if you do a small prank and the other guy tries like hell to pretend he didn't notice and also tries like hell to pretend he didn't see you derping around in the building... escalating would have been evil on my part :)

        • archontes 2 years ago

          Machiavellian... evil... semantics.

ilyt 2 years ago

I always wanted such thing for various "plumbing" services (DHCP/DNS/wifi controller etc) but lack of ECC and OOB management kinda disqualifies it for anything serious.

>He's running forty Blades in 2U. That's:

    >
    >      160 ARM cores
    >      320 GB of RAM
    >      (up to) 320 terabytes of flash storage
    >
>...in 2U of rackspace.

Yay that's like... almost as much as normal 1U server can do

Edit: I give up, HN formatting is idiotic

  • xattt 2 years ago

    But does anyone remember the Beowulf trope(1) from Slashdot? Am I a greybeard now?

    (1) https://hardware.slashdot.org/story/01/07/14/0748215/can-you...

    • bradleyy 2 years ago

      So, I once met a guy named Don.

      We were hanging out in the garage of a mutual friend, chatting. Got to the "what do you do" section of the conversation, and he says he works in massively parallel stuff at XYZ corp. Something something, GPUs.

      I make the obvious "can you make a Beowulf cluster?" joke, to which he responds (after a pregnant pause), "you... do know who I am?"

      Yep. Donald Becker. A slightly awkward moment, I'll cherish forever.

      • rfrey 2 years ago

        Ew. "Do you have any idea who I am?" is never a good look.

    • mometsi 2 years ago

      I like user big.ears' speculation on what someone could possibly do with that much parallel compute:

        I don't think there's any theoretical reason someone couldn't build a fairly realistic highly-complex "brain" using, say, 100,000,000 simplified neural units (I've heard of a guy in Japan who is doing such a thing), but I don't really know what it would do, or if it would teach us anything that is interesting.
      • xattt 2 years ago

        Simplified neural unit? Less capacity than a human brain? Lame.

        • dylan604 2 years ago

          Lame, that just gives the opportunity to release for a higher price the Neural Unit Pro, Neural Unit Max, and Neural Unit ProMax. You consider it lame because you're acumen in business nuance is lame ;-)

    • flyinghamster 2 years ago

      You and me both. The funny thing is, I wound up writing a program that would benefit from clustering, and felt my way around setting up MPICH on my zoo. I laughed out loud when I realized that, after all these years, I'd built an impromptu Beowulf cluster, even though the machines are scattered around the house.

      Installing MPICH from source instead of from your distribution is best if you can't have all your cluster members running the same version of the same distro and/or have multiple architectures to contend with. But it takes forever to compile, even on a fast machine.

      • dredmorbius 2 years ago

        That's more a COW, than a (Beo)wulf, no?

        <https://www.oreilly.com/library/view/high-performance-linux/...>

        A cluster of workstations (COW) is usually opportunistic exploiting existing systems, and lower density than a dedicated (usually rack-based or datacentre-based) cluster.

        In practice, COWs usually turn out to be not especially useful, though there are exceptions.

      • ilyt 2 years ago

        well since you already have a cluster you could distcc the compilation. I remember doing that on a bunch of machines when we were building gentoo

    • neilv 2 years ago

      To go along with "Imagine a Beowulf cluster of those!", don't forget "Take my money!"

      • Koshkin 2 years ago

        You can get off my lawn now.

    • wrldos 2 years ago

      It remember building a 4 node Beowulf cluster out of discarded compaq desktops and then having no idea what to do with it.

      • mkj 2 years ago

        75mhz, yeah! Stacked on top of each other! With 10mbit ethernet! I think we got OpenMosix going even.

        But then 5 years later I was working on them for a living in HPC, but they were no longer called Beowulf Clusters then.

      • red-iron-pine 2 years ago

        Did kinda the same thing but with Raspberry Pis. Neat, a cluster of r-pi's... now what?

        • ipsin 2 years ago

          If you want to continue the chain of specific goals in service of no specific purpose: run Kubernetes on it.

          • Sebb767 2 years ago

            Then add ArgoCD for deployment and istio for a service mesh!

            While you are at it, also setup Longhorn for storage. With that solved, you might as well start hosting Gitea and DroneCI on the cluster, plus an extra helm- and docker repo for good measure. And in no time you will have a full modern CI/CD setup to do nothing but updates on! :-)

            Seriously, though, you will learn a lot of things in the process and get a bottom up view of current stacks, which is definitely helpful.

            • wrldos 2 years ago

              I did this. I am still dead inside. Thank goodness all my production shit has a managed control plane and network.

          • worthless-trash 2 years ago

            Alternatively, if you dont like the kube's learn yourself some erlang, and make a super fault tolerant application.

        • mr_toad 2 years ago

          Step 1: run a page ranking algorithm using Naive Bayes on a bastardised HPC framework

          Step 2: add advertising

          Step 3: make more money than God.

    • faichai 2 years ago

      Natalie Portman says yes, and instructs you to put some hot grits down your pants.

    • edoloughlin 2 years ago

      I do and you are. I’m also imagining one covered in hot grits…

    • pdpi 2 years ago

      Beowulf clusters were those lame things that didn’t have wireless, and had less space than a nomad, right?

    • trollied 2 years ago

      Don't forget the Hot Grits & Natalie Portman.

    • Quequau 2 years ago

      I do but in all fairness, I have an entirely grey beard.

    • pjmlp 2 years ago

      I do. And cool research OSes that did process migration.

      • nine_k 2 years ago

        Ah, Plan 9.

        • zozbot234 2 years ago

          I'm not sure that Plan 9 does process migration out of the box. It does have complete "containerization" by default, i.e. user-controlled namespacing of all OS resources - so snapshotting and migration could be a feasible addition to it.

          Distributed shared memory is another intriguing possibility, particularly since large address spaces are now basically ubiquitous. It would allow users to seamlessly extend multi-threaded workloads to run on a cluster; the OS would essentially have to implement memory-coherence protocols over the network.

          • nine_k 2 years ago

            If not Plan 9, then likely Inferno. (A pretty different system, of course.)

        • wiredfool 2 years ago

          And God help us, OS2/Warp.

          • pjmlp 2 years ago

            With much better tooling for OO ABI than COM/WinRT will ever get (SOM).

    • b33j0r 2 years ago

      Sure do! It is interesting that these technologies evolve more slowly than it seems, sometimes.

      On the graybearding of the cohort, here’s a weird one to me. These days, I mention slashdot and get more of a response from peers than mentioning digg!

      In 2005, I totally thought digg would be around forever as the slashdot successor, but it’s almost like it never happened (to software professionals… er, graybeards)

    • pbronez 2 years ago

      New to me - found the source article in the Wayback Machine:

      https://web.archive.org/web/20010715201416/http://www.scient...

      • pbronez 2 years ago

        Found the definition:

        Sterling and his Goddard colleague Donald J. Becker connected 16 PCs, each containing an Intel 486 microprocessor, using Linux and a standard Ethernet network. For scientific applications, the PC cluster delivered sustained performance of 70 megaflops--that is, 70 million floating-point operations per second. Though modest by today's standards, this speed was not much lower than that of some smaller commercial supercomputers available at the time. And the cluster was built for only $40,000, or about one tenth the price of a comparable commercial machine in 1994.

        NASA researchers named their cluster Beowulf, after the lean, mean hero of medieval legend who defeated the giant monster Grendel by ripping off one of the creature's arms. Since then, the name has been widely adopted to refer to any low-cost cluster constructed from commercially available PCs.

    • lemper 2 years ago

      yeah, wanted to replicate something like that by proposing a hardware vendor who visited my uni. decades ago. didn't go nowhere because i was intimidated by the red-tapes.

    • iamflimflam1 2 years ago

      But does it run Doom?

      • mejutoco 2 years ago

        Crysis

        • cptnapalm 2 years ago

          What does Natalie Portman need to imagine a Beowulf cluster of Dooms running Crysis? Grits?

  • marginalia_nu 2 years ago

    I do think this is sort of fool's gold in terms of actual performance. Even though the core count and RAM size is impressive, those cores are talking over ethernet rather than system bus.

    Latency and bandwidth is atrocious in comparison, and you're going to run into problems like no individual memory allocation being able to exceed 8 Gb.

    Like for running a hundred truly independent jobs then sure, maybe you'll get equivalent performance, but that's a very unique scenario that is rare in the real world.

    • varispeed 2 years ago

      I built such a toy cluster once to see for my self and gave up. It is too slow to do anything serious. You can be much better off by just buying older post lease server. Sure it will consume more power, but conversely you will finish more tasks in shorter time, so advantage of using ARM in that case may be negligible. If it was Apple's M1 or M2, that would have been a different story though. RPi4 and clones are not there yet.

      • marginalia_nu 2 years ago

        I overall think people tend to underestimate the overhead of clustering. It's always significantly faster to run a computation on one machine than spread over N machines with hardware of (1/N) power.

        That's not always a viable option because of hardware costs, and sometimes you want redundancy, but those concerns are on an orthogonal axis to performance.

        • convolvatron 2 years ago

          Lines gets blurred when you are on a supercomputer interconnect and a global address space or even rdma

          • dekhn 2 years ago

            the fastest practical interconnects are roughly 1/10th the speed of local RAM. Because of that, if you use interconnect, you don't use it for remote RAM (through virtual memory).

            I don't think anybody in the HPC business really pursued mega-SMP after SGI because it was not cost-effective for the gains.

            • p_l 2 years ago

              Both Single System Image and giant NUMA machines were and are still pursued because not everything scales in shared-nothing message passing well (some stuff straddles it by doing distributed shared memory over MPI but using it mostly for synchronisation).

              It's just that there's a range of very well paying problems that scale quite well in message passing systems, and this means that even if your problem scales very badly on them, you might have easier time brute forcing the task on larger but inefficient supercomputer rather than getting funding for smaller more efficient one that fits your problems better.

            • convolvatron 2 years ago

              Cray did some vector machines that were globally addressed but not coherent. That’s an interesting direction. So is latency hiding.

              The really important thing is that the big ‘single machine’ you’re talking about already has numa latency problems. Sharing a chassis doesn’t actually save you from needing to tackle latency at scale.

        • jbverschoor 2 years ago

          Well, a complete M1 board, which is basically about as large as half an iPhone mini, is fast enough. It's also super efficient. So I'm still waiting for Apple to announce their cloud.

          They're currently putting Mx chips in every device they have, even the monitors. It'll be the base system for any electric device. I'm sure we'll see more specialized devices for different applications, because at this point, the hardware is compact, fast, and secure enough for anything, as well as the software stack.

          Hello Apple Fridge

    • PragmaticPulp 2 years ago

      >I do think this is sort of fool's gold in terms of actual performance.

      It’s a fun toy for learning (and clicks, let’s be honest).

      It’s not a serious attempt at a high performance cluster or an exercise in building an optimal computing platform.

      Enjoy the experiment and the uniqueness of it. Nobody is going to be choosing this as their serious compute platform.

      • analognoise 2 years ago

        In TFA, isn't Jetbrains using it as a CI system?

        • bee_rider 2 years ago

          Tangential, but it is so funny to me that “TFA” has become a totally polite and normal way to refer to the linked article on this site. Expanding that acronym would really change the tone!

          • OJFord 2 years ago

            I'm not sure it is 'totally polite'? I usually read it as having a 'did you even open it' implication that 'OP' or 'the submission' doesn't. Maybe that's just me.

            • bee_rider 2 years ago

              Maybe it isn’t totally polite, but it IMO it reads in this case more like slight correction than “In the fucking article,” which would be pretty aggressive, haha.

            • cyberpunk 2 years ago

              I always thought it meant “the featured article”; I must be a lot more wholesome than previously indicated!

        • lmz 2 years ago

          Unless they need something Pi specific I don't understand why this would be preferable versus just virtualizing instances on a "big ARM" server. I'm sure those exist.

    • DrBazza 2 years ago

      It probably lends itself to tasks where CPU time is much greater than network round trip. Maybe scientific problems that massively parallel. Way back in the 90s I worked with plasma physics guys that used a parallel system on "slow" Sun boxes. I can't remember the name of the software though.

  • singron 2 years ago

    It's actually 3U since the 2U of 40 pis will need almost an entire 1U 48 port PoE switch instead of plugging into the TOR. The switch will use 35-100W for itself depending on features and conversion losses. If each pi uses more than 8-9W or so under load, then you might actually need a second PoE switch.

    If you are building full racks, it probably makes more sense to use ordinary systems, but if you want to have a lot of actual hardware isolation at a smaller scale, it could make sense.

    In some colos, they don't give you enough power to fill up your racks, so the low energy density wouldn't be such a bummer there.

  • 2OEH8eoCRo0 2 years ago

    > Yay that's like... almost as much as normal 1U server can do

    Hyperscale in your Homelab. Something to hack on, learn, host things like Jellyfin, and have fun with.

    • jeffbee 2 years ago

      I agree but can't you get the same effect with VMWare ESXi? If I just wanted to "have fun" managing scores of tiny computers, and I emphasize that this sounds like the least amount of fun anyone could have, I can have as many virtual machines as I want.

      • fishtacos 2 years ago

        I can understand why some people want something physical/tangible while testing or playing in their hobby environment. I'm still a fan of virtualization - passmark scores for an RPi4 (entire SOC/quad core) are 21 times less than a per-single-core comparison in a 14-core 15-13600k (as a point of reference, my current system) and while am running 64GB RAM, can easily upgrade to 128GB or more on a single DDR4 node.

        Hard to see to an advantage given obvious limitations, although it may make it more fun to work within latency and memory constrictions, I guess.

    • gaudat 2 years ago

      Haha jellyfin would eat through all your memory and cpu time transcoding or remuxing on a SBC. I'm seriously thinking of getting another home server just to run that.

  • guntherhermann 2 years ago

    It's ~~four~~ two spaces to get the "code block" style.

        like
        this
    
    and asterisk for italics (I don't think there is a 'quote' available, and I'm not sure how they play together.

    * does this work? * Edit: No! Haha

        *how*
        *about*
        *this*
    
    Edit: No, no joy there either.

    I agree, it's not the most intuitive formatting syntax I've come across :)

    I guess we're stuck with BEGIN_QUOTE and END_QUOTE blocks!

    • ilyt 2 years ago

      The code block was workaround

      I wanted to do this

          > * list element 1
          > * list element 2
      
      without the indent.

      I don't get why it just doesn't use common mark or something, it's just some inept, half-assed clone of it anyway.

  • PragmaticPulp 2 years ago

    > but lack of ECC and OOB management kinda disqualifies it for anything serious.

    > Yay that's like... almost as much as normal 1U server can do

    It’s a fun toy. Obviously it isn’t the best or most efficient way to get any job done. That’s not the point.

    Enjoy it for the fun experiment that it is.

  • LeonM 2 years ago

    > I always wanted such thing for various "plumbing" services (DHCP/DNS/wifi controller etc)

    You don't need a cluster for that, even a 1st gen Pi can run those services without any problem.

    • guntherhermann 2 years ago

      I can only speak for Raspi 3B+, but I agree.

      I have multiple services running on it (including pihole, qbittorrent, vpn) and it's at about 40% mem usage right now.

    • ilyt 2 years ago

      I need multiple nodes for redundancy, not because it can't fit on one. And preferably at least 3 for quorum too.

  • sys42590 2 years ago

    Indeed, that box here next to my desk draws 50W of electricity continuously despite being mostly idle. Why? Because it has ECC.

    Having some affordable low power device with ECC would be a game changer for me.

    I added affordable to exclude expensive (and noisy) workstation class laptops with ECC RAM.

    • Maakuth 2 years ago

      There are Intel Atom CPUs that support ECC. I had a Supermicro motherboard with a quad core part like that and I used it as a NAS. It was not that fast, but the power consumption was very low.

      • smartbit 2 years ago

        Do you remember how many Watts it was using with idle disks?

        • MrFoof 2 years ago

          I personally have at 43-45W idle…

              >Corsair SF450 PSU
              >ASRock Rack X570D4U w/BMC
              >AMD Ryzen 7 Pro 5750GE (8C 3.2/4.6 GHz)
              >128GB DDR4-2666 ECC
              >Intel XL710-DA1 (40Gbps)
              >LSI/Broadcom 9500-8i HBA
              >64GB SuperMicro SATA DOM
              >2 SK Hynix Gold P31, 2TB NVMe SSD
              >8 Hitachi 7200rpm, 16TB HDD
              >3 80mm fans, 2 40mm fans, CPU cooler
          
          That was an at the time modern “Zen 3” (using Zen 2 cores) system on an X570 chipset. The CPU mostly goes in 1L ultra SFF systems. TDP is 35W, and under stress testing the CPU tops out around around 38.8-39W. The onboard BMC is about 3.2-3.3W of power consumption itself.

          Most data ingest and reads comes from the SSD cache, with that being more around 60W for high throughput. Under very high loads (saturating the 40Gbps link) with all disks going, only hits about 110-120W.

          By comparison, a 6-bay Synology was over double that idle power consumption, and couldn’t come close to that throughput.

          • sys42590 2 years ago

            thanks for the parts list, especially because I think ASRock Rack paired with a Ryzen Pro offers better performance than a Supermicro in the same price range.

            • MrFoof 2 years ago

              There’s reasons for that though.

              I could drop a few more watts if ASRock could put together a decent BIOS where disabling things actually disables things.

              SuperMicro costs what it does for a reason.

              —- ————-

              If you’re looking for a chassis, I’m using a SilverStone RM21-308, with a Noctua NH-L9a-AM4 cooler, and cut some SilverStone sound deadening foam for the top panel of the 2U chassis.

              Aside from disks clicking, it’s silent, runs hilariously cool (I 3D printed chipset and HBA fan mounts at a local library) and it’s more usable storage, higher performance (saturates 40Gbps trivially) and lower power consumption than anything any YouTuber has come remotely close to. That server basically lets me have everything else in my rack not care much about storage, because the storage server handles it like a champ. I really considered doing a video series on it, but I’m too old to want to deal with the peanut gallery of YouTube comments.

              • philsnow 2 years ago

                If you don't mind me asking, how do your other workloads access the storage on it, NFS? The stumbling block for NFS for me is identity and access management.

          • doublepg23 2 years ago

            Wow I just picked up an ASRock Rack X570D4U and put my 5950X into it.

            Do you know how to make the BMC not a laggy mess when using the “H5Viewer”? I’m getting basically unusable latency when the system is two yards away compared to a RDP server 1,000 miles away.

          • Maakuth 2 years ago

            That's impressively low, considering the amount of storage capacity and the performance potential for the time you need it. It goes a long way towards paying for itself if you replace some old Xeon server with it.

    • namibj 2 years ago

      Most AMD desktop platforms support ECC, and if you don't use overclocking facilities, they are pretty efficient (though their chiplet architecture causes idle power draw to be a good fraction of active power draw, still much less than 50W though).

    • nsteel 2 years ago

      > Why? Because it has ECC

      Sorry if I am missing the obvious here, but why would ECC consume so much power?

      • growse 2 years ago

        It's not that ECC consumes power, it's that systems that support ECC tend to consume more idle power (because they're larger etc.)

    • stordoff 2 years ago

      How much RAM is that with? My home server idles at ~25-27W, but that's with only 16GB (EEC DDR4). However, throwing in an extra 16GB as a test didn't measurably change the reading.

    • aidenn0 2 years ago

      Xeon-D series?

    • walterbell 2 years ago

      Epyc Embedded and possibly some Ryzen Embedded devices.

  • imtringued 2 years ago

    Didn't AMD announce a 96 core processor with dual socket support?

    As usual this is either done for entertainment value or to simulate physical networks (not clusters).

    • adrian_b 2 years ago

      Intel also has now up to 480 cores in an 8-socket server with 60 cores per socket, though Sapphire Rapids is handicapped in comparison with AMD Genoa by much lower clock frequencies and cache memory sizes.

      However, while the high-core-count CPUs have excellent performance per occupied volume and per watt, they all have extremely low performance per dollar, unless you are able to negotiate huge discounts, when buying them by the thousands.

      Using multiple servers with Ryzen 9 7950X can provide a performance per dollar many times higher than that of any current server CPU, i.e. six 16-core 7950X with a total of 384 GB of unbuffered ECC DDR5-4800 will be both much faster and much cheaper than one 96-core Genoa with 384 GB of buffered ECC DDR5-4800.

      Nevertheless, the variant with multiple 7950X is limited for many applications by either the relatively low amount of memory per node or by the higher communication latency between nodes.

      Still, for a small business it can provide much more bang for the buck, when the applications are suitable for being distributed over multiple nodes (e.g. code compilation).

      • cjbgkagh 2 years ago

        This is the exact space I’m in, high cpu low network. By my estimates it’s about 1/4 the cost per CPU operation to use consumer hardware instead of enterprise. The extra computers allow for application level redundancy so the other components can be cheaper as well.

      • bee_rider 2 years ago

        One problem with 480 cores in single node: 480 cores is a shitload of cores, who needs more than a single node at this point? The MPI programmer inside me is having an existential breakdown.

  • FlyingAvatar 2 years ago

    I think the hardware isolation would be a selling point in some cases. Granted, it's niche.

    • ilyt 2 years ago

      In most cases it is wrong choice unless your business is selling raspberry pi hosting I guess

  • goodpoint 2 years ago

    > Yay that's like... almost as much as normal 1U server can do

    ...but the normal server is much cheaper.

  • metalspot 2 years ago

    its a nice hobby project, but of course a commercial blade system will have far higher compute density. supermicro can do 20 epyc nodes in 8u, which at 64 cores per node is 1280 cores in 8u, or 160 in 1u, so double the core density, and far more powerful cores, so way higher effective compute density.

  • timerol 2 years ago

    Also not noted: 320 TB in 40 M.2 drives will be extremely expensive. Newegg doesn't have any 8 TB M.2 SSDs under $1000. $0.12/GB is about twice as expensive as more normally-sized drives, to say nothing of the price of spinning rust.

  • guntherhermann 2 years ago

    > Yay that's like... almost as much as normal 1U server can do

    What about cost, and other metrics around cost (power usage, reliability)? If space is the only factor we care about then it seems like a loss.

    • betaby 2 years ago

      What about them? 1U servers from vendors are reliable and efficient - people use them in production for years. As for the cost, those hobby-style board are very expensive for dollars/performance. Indeed I'm not getting why would one want a cluster of expensive, low spec nodes?

  • bee_rider 2 years ago

    Just the Pi’s are $35 a pop, right? So that’s $1400 of Pi’s, on top of whatever the rest of the stuff costs. Wonder how it compares to, I guess, a whatever the price equivalent AMD workstation chip is…

    • philsnow 2 years ago

      It seems they're the ones with 8 GB of ram, so probably closer to $75 each.

      • bee_rider 2 years ago

        I’d be interested to see if anyone had any application other than CI for Raspberry Pi programs, I really can’t see one.

  • actually_a_dog 2 years ago

    What are you talking about "lack of ECC?" The Pi4 has ECC.

    • ilyt 2 years ago

      Didn't knew that! Would still prefer one that actually reports the errors tho

  • mayli 2 years ago

    That would be 40x (Rpi4 8GB $75 + 8TB nvme $1200 + psu and others) ~ $51000.

  • MuffinFlavored 2 years ago

    > lack of ECC and OOB management kinda disqualifies it

    Can you expand on this please?

    • geerlingguy 2 years ago

      ECC RAM is more robust (fewer crashes due to random bit flips), and OOB management means if a server has issues, you can remotely view it as if you were jacked in, and force reboot, among other things (like installing an OS remotely).

  • zaarn 2 years ago

    The 1U server is however likely to use more than 200 Watts of power that the 40 Blade 2U setup would use.

    • logifail 2 years ago

      > The 1U server is however likely to use more than 200 Watts of power

      Q: Why would a 1U server need more than 200W if you're doing nothing more than basic network services?

      I have mini tower servers that draw a fraction of that at idle.

      • zaarn 2 years ago

        The Pi's will be using those 200Watts at near full tilt. The main use here would be larger computational tasks that you can easily split up among the blades. Or you run a very hardware-failure tolerant software service on top.

      • bluedino 2 years ago

        I have some idle Dell R650's that draw 384W. A couple drives, buncha RAM, two power supplies, 2 CPU's (Xeon 8358)

        • livueta 2 years ago

          Hrm, interesting to see how the TDP of those 8358s drives overall power consumption. I'm looking at the idrac consoles of a couple R720XDs with 12 3.5" hdds, >128gb ram, two E5-2665s per, and they're all currently sipping ~150W at < 1 load average. The E5s have a TDP of 115W to the 8358's 250W, so I assume that's what's most of it. I admittedly do some special IPMI fan stuff, but that only shaves off tens of watts.

        • logifail 2 years ago

          > Dell R650's that draw 384W

          Umm, I'm not sure I can afford the electricity to run kit like that :)

          I'm currently awaiting delivery of an Asus PN41 (w/ Celeron N5100) to use as yet another home server, after a recommendation from a friend. Be interesting to see how much it draws at idle!

alex_suzuki 2 years ago

Ah, do you feel it too? That need to own some of these, even though you have zero actual use for them.

  • petesergeant 2 years ago

    Nothing generates that feeling for me like seeing these things:

    https://store.planetcom.co.uk/products/gemini-pda-1

    I absolutely can't imagine what I'd use it for, and yet, my finger has hovered over "buy" many many times over the last few years

  • fy20 2 years ago

    I think I could justify the world's most secure and reliable Home Assistant cluster with automatic failover...

    • criddell 2 years ago

      Frankly, the bar for that is pretty low...

    • pbronez 2 years ago

      Yeah that’s my thought. The main benefit to this is High Availability. You’re not going to get compelling scale-out performance, but you can protect yourself from local hardware failures.

      Of course, then you have to ask if you need the density. There are lots of ways to put Rpi in a rack.. and this approach gives up Hat compatibility for density.

      For example, I’m considering a rack of Rpi with hifi berry DACs for a multi-zone audio system. This wouldn’t help me there.

  • Hamuko 2 years ago

    I don't feel like I have zero actual use for them. The amount of Docker containers I have running on my NAS is only ever going up. These could make for a nice, expandable Kubernetes cluster.

    As for if that's a good use-case is a whole another thing.

ChuckMcM 2 years ago

That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back. Connect them to a jelly bean switch chip (8 port GbE are like $8 in qty) Signal integrity on, at most 4" of PCB trace should not be a problem. You could bring the network "port status" lines to the front if you're interested in seeing the blinky lights of network traffic.

The big win here would be that all of the network wiring is "built in" and compact. Blade replacement it trivial.

Have your fans blow up from the bottom and stagger "slots" on each row and if you do 32 slots per row, you probably build a kilocore cluster in a 6U box.

Ah the fun I would have with a lab with an nice budget.

  • zokier 2 years ago

    > That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back

    That stuck out to me too, they are making custom boards and custom chassis, surely it would be cleaner to route the networking and power through backplane instead of having gazillion tiny patch cables and random switch just hanging in there. Could also avoid the need for PoE by just having power buses in the backplane.

    Overall imho the point of blades is that some stuff gets offloaded to the chassis, but here the chassis doesn't seem to be doing much at all.

  • themoonisachees 2 years ago

    Couldn't you do 1ki cores /4U with just Epyc CPUs in normal servers? At that point surely for cheaper, also significantly easier to build, and faster since the cores don't talk over Ethernet?

  • sitkack 2 years ago

    > jelly bean switch chip

    What do you have in mind? I couldn't find this part. Really am asking.

    • ChuckMcM 2 years ago

      Exemplar: https://octopart.com/ksz9477stxi-microchip-80980651

      The Chinese made ones are even cheaper, open up a TP-Link "desktop 8 port Gigabit Switch" and you will find the current "leader" in that market. Those datasheets though will be in Chinese so it helps to be able to read Chinese. (various translate apps are not well suited to datasheets in my experience)

      • sitkack a year ago

        Thank you.

        Yeah, I found the ones on mouser and digikey. $20 is a bit much (not for a one off, but if you are aggregating low end processors you will need a lot of them).

        I'd love something like a 12-20 port 1Ge with a 10Ge uplink. If you find a super cheap 1Ge switch chip and docs (I suppose you could just reverse engineer the pcb from a tp-link switch), please post it.

        • ChuckMcM a year ago

          No worries, the key though is cross section bandwidth. The "super cheap" GbE switch chips can have as little as 2.5 GBPS of cross section bandwidth which makes them ill suited for cluster operations.

  • nine_k 2 years ago

    What kind of fun might that be?

    • ChuckMcM 2 years ago

      Well for one, I'd build a system architecture I first imagined back at Sun in the early 90's which is a NUMA fabric attached compute/storage/io/memory scalable compute node.

      Then I'd take a shared nothing cluster (typical network attached Linux cluster) and refactor a couple of algorithms that can "only" run on super computers and have them run faster on a complex that costs 1/10th as much. That would be based on an idea that was generated by listening to IBM and Google talk about their quantum computers and explaining how they were going to be so great. Imagine replacing every branch in a program with an assert that aborts the program on fail. You send 10,000 copies of the program to 10,000 cores with the asserts set uniquely on each copy. The core that completes kicks off the next round.

thejosh 2 years ago

These would be awesome for build servers, and testing.

I really like Graviton from AWS, and Apple Silicon is great, I really hope we move towards ARM64 more. ArchLinux has https://archlinuxarm.org , I would love to use these to build and test arm64 packages (without needing to use qemu hackery, awesome though that it is).

Aissen 2 years ago

Multiple server vendors now have Ampere offerings. In 2U, you can have:

* 4 Ampere Altra Max processors (in 2 or 4 servers), so about 512 cores, and much faster than anything those Raspberry Pi have.

* lots of RAM, probably about 4TB ?

* ~92TB of flash storage (or more ?)

Edit : I didn't want to disparage the compute blade, it looks like a very fun project. It's not even the same use case as the server hardware (and probably the best solution if you need actual raspberry pis), the only common thread is the 2U and rack use.

  • dijit 2 years ago

    those things are insanely expensive though, I priced a 2core machine at 20,000 EUR without much ram or SSDs.

    I'm keeping my eyes open though.

    • Aissen 2 years ago

      An open secret of the server hardware market: public prices mean nothing and you can get big discounts, even at low volume.

      But of course the config I talked about is maxed-out and would probably be more expensive than 20k. It would be interesting to compare the TCO with an equivalent config, and I wouldn't be surprised to see the server hardware still win.

    • aeyes 2 years ago

      Try the HPE RL300, should be more reasonably priced but I couldn't get a quote because availability seems to be problematic at the moment.

davgoldin 2 years ago

This looks very promising. I basically could print an enclosure to specifically fit my home space. And easily print a new one when I move.

More efficient use of space compared to my current silent mini-home lab -- also about 2U worth of space, but stacked semi-vertically [1].

That's 4 servers each with AMD 5950x, 128GB ECC, 2TB NVMe, 2x8TB SSD (64c/512GB/72TB total).

[1] https://ibb.co/Jm1SX7d

  • LolWolf 2 years ago

    Wait this is pretty sick! What's the full build on that? How do you even get started on finding good cases that aren't just massive racks for a home build?

    • davgoldin 2 years ago

      The case is "LZMOD A24 V3" - found it on caseend.com - there are smaller ITX cases, but I wanted to fit in standard components only, and not to mess with custom PSUs (for example).

      The rest of the components are:

      Board: AsRock Rack X570D4I-2T (2x 10GBe and IPMI!)

      NVMe: 2TB Transcend TS2TMTE220S TLC

      SSD: 2x 8TB Samsung 870 QVO

      PSU: Seasonic SSP-300SUB (overkill, went for longevity)

      CPU Cooling: Thermalright AXP-100 Series All-Copper Heatsink with Noctua NF-A12x15 PWM

      Exhaust fans: 2x INEX AK-FN076 Slimfan 80mm PWM

      On the air intake side, there's a filter sheet that I replace (or vacuum) once in a blue moon - the insides are still pristine after running for over a year now.

      Interesting thing about cooling: one of those cases has a PSU with custom made cabling (reduced cables by about 90%). I was hoping it will reduce the temperatures a bit. Surprisingly there was basically no change. At full load all keep running at around 70 celsius.

      Important: in such a small case, if you want silence you'd better disable AMD's "Core Performance Boost". This will make the CPU run at its nominal frequency, 3.4GHz for 5950x, otherwise it'll keep on jumping to it's max potential, 4.9GHz for 5950x, which will result in more heat, and more fan noise.

blitzar 2 years ago

The blade has arrived but can you get a compute unit to go in it? The non availability of the whole pi ecosystem has done a lot of damage.

  • bombcar 2 years ago

    The Rock5B is whipping the Pi on compute power and availability. Only use a Pi if you absolutely have to.

    • blitzar 2 years ago

      At $150+ I would just buy an old small form factor dell / hp from ebay and have a whole machine.

      • bogwog 2 years ago

        I bought a retired dual-socket Xeon HP 1U server on ebay with 128GB of ECC RAM for like $50 on ebay a while back. It only had one CPU, but upgrading it to two would be very cheap.

        Sure, it's hulking, obsolete, and very loud beast, but it's hard to beat the price to performance ratio there... just make sure you don't put anything super valuable on it because HP's old proliant firmware likely has a ton of unpatched critical vulnerabilities (and you'd need an HP support plan to download patches even if they exist)

      • celestialcheese 2 years ago

        100% this.

        I picked up a HP 705 G4 mini on backmarket for $80 shipped the other day to run Home Assistant and some other small local containers. 500gb ram, Ryzen 5 2400GE, 8gb ddr4 w/ a valid windows license.

        Sure it's not as small or silent, but there's no way to beat the prices of these few-years old enterprise mini-pc's

  • preisschild 2 years ago

    There are other CM-compatible SoMs.

    Like the Pine64 SOQUARTZ

    • russelg 2 years ago

      Geerling covers this in the accompanying video for this post. He couldn't get it running due to no working OS images being obtainable.

      • fivesixzero 2 years ago

        I spent some time last week tinkering with a SOQuartz board and ended up getting it working with a Pine-focused distro called Plebian[1].

        Took awhile to land on it though. Before that I tried all of the other distros on Pine64's "SOQuartz Software Releases"[2] page without any luck. The only one on that page that booted was the linked "Armbian Ubuntu Jammy with kernel 5.19.7" but it failed to boot again after an apt upgrade.

        So there's at least one working OS, as of last week. But its definitely quite finicky and would probably need some work to build a proper device tree for any carrier board that's not the RPi CM4 Carrier Board.

        [1] https://github.com/Plebian-Linux/quartz64-images

        [2] https://wiki.pine64.org/wiki/SOQuartz_Software_Releases

      • blitzar 2 years ago

        So you can get those compute units are obtainable, but a functioning image remains unobtanium. What a mess.

        • geerlingguy 2 years ago

          You can usually get an image that functions at least partially, but it's up to you to determine whether the amount it functions is enough for your use case. A K3s setup is usually good to go without some features like display output.

          • blitzar 2 years ago

            I like to tinker, but there is a limit.

            The killer feature for their go fund would be is if they sourced a batch of pi compute modules ...

            • geerlingguy 2 years ago

              I've asked about that. There's a small possibility, but the earliest it would be able to happen (a batch of CM4 to offer as add-ons) would be summer, most likely :(

      • fellowmartian 2 years ago

        I don’t think these boards are meant for the way people are trying to use them. Mainline Linux support is actually great on RK3566 chips, but you have to build your own images with buildroot or something like that.

wildekek 2 years ago

I have this cycle every 10 years where my home infra gets to enterprise level complexity (virtualisation/redundancy/HA) until the maintenance is more work than the joy it brings. Then, after some outage that took me way too long to fix, I decide it is over and I reduce everything down to a single modem/router and WiFi AP. I feel the pull to buy this and create a glorious heap of complexity to run my doorbell on and be disapointed, can't wait.

aseipp 2 years ago

I love the form factor. But please. For the love of god. We need something with wide availability that supports at least ARMv8.2.

At this rate I have so little hope in other vendors that we'll probably just have to wait for the RPi5.

walrus01 2 years ago

If you want "hyperscale" in your homelab, the bare metal hypervisor needs to be x86-64 because unless you literally work for Amazon or a few others you are unlikely to be able to purchase other competitively priced and speedy arm based servers.

There is still near zero availability in mass market for CPUs you can stick into motherboards from one of the top ten taiwanese vendors of serious server class motherboards.

And don't even get me started on the lack of ability to actually buy raspberry pi of your desired configuration at a reasonable price and in stock to hit add to cart.

  • vegardx 2 years ago

    Supermicro launched a whole lineup of ARM-based servers last fall. They seem to mostly offer complete systems for now, but as far as I understand that's mostly because there's still some minor issues to iron out in terms of broader support.

onphonenow 2 years ago

I’ve been getting good price/perf just doing the top AMD consumer CPU’s. Wish someone would make an AM5 platform motherboard with out of band / remote console mgmt. that really is a must if you have a bunch of boxes and have them somewhere else. The per core speeds are high on these. 16 core / 32 threads/boxe gets you enough for a fair bit.

  • trevorstarick 2 years ago

    Have you taken a look at any of AsrockRack's offerings? They've got some prelim 650 mATX boards: https://www.asrockrack.com/general/productdetail.asp?Model=B...

    • onphonenow 2 years ago

      It’s a fantastic board spec. Timing with them on availability can take longer. If they can get VMware compatible would be great. Because they have dual 10g you need no network card in most cases. AM5 integrated graphics allows bring up / troubleshooting with no additional card (if remote console not working well). My use case I’d trade an extra m2 slot for a pci slot but can see their approach. These boards fit in nice compact setups as a result because u can run no pci and no hd setups

Havoc 2 years ago

I’ve built a small rasp k3s cluster with pi4 and ssd. It works fine but one can ultimately still feel that they are quite weak. Or put differently deploying something on k3s still ends up deploying on a single node in most cases and this gets single node performance under most circumstances

  • nyadesu 2 years ago

    I've been running a cluster like that since some years ago and definitely felt that, but it was easy to fix by adding AMD64 nodes to it

    Modifying the services I'm working on to build multi-arch container images was not as straightforward as I imagined, but now I can take advantage of both ARM and AMD64 nodes on my cluster (plus I learned to do that, which is priceless)

eismcc 2 years ago

It's amazing to see how far these systems have come since my coverage from The Verge in 2014, where I built a multi-node Parallella cluster. The main problem I had then was that there was no of the shelf GPU friendly library to run on it, so I ended up working with the Gray Chapel project to get some distributed vectorization support. Of course, that's all changed now.

https://www.theverge.com/2014/6/4/5779468/twitter-engineer-b...

lars-b2018 2 years ago

It's not clear to me how to build a business based on RPi availability. And the clones don't seem to be really in the game. Are Raspberry Pis becoming more readily available? I don't see that.

  • nsteel 2 years ago

    Businesses and consumers don't see the same availability, apparently. And yes, they are very slowly becoming more available. But still no Pi 4 about.

  • goodpoint 2 years ago

    Correct. These are for hobbyists and there is no market.

exabrial 2 years ago

I really want something like NVidia's upcoming Grace CPU in blade format, but something where I can provision a chunk of SSD storage off a SAN via some sort of PCI-E backplane. Same form factor like the linked project.

I'm noticing that our JVM workloads execute _significantly_ faster on ARM. Just looking at the execution times on our lowly first-gen M1s Macbooks is significantly better than some of our best Intel or AMD hardware we have racked. I'm guessing it all has to do with Memory bandwidth.

sroussey 2 years ago

Apple should go with a blade design for the Mac Pro. Just stick in as many M2 Ultra blades as you need to up the compute and memory.

Will need to deal with NUMA issues on the software side.

  • geerlingguy 2 years ago

    I would be all over any server like form factor for M-series chips. The efficiency numbers for the CPU are great.

nubinetwork 2 years ago

I have a few armada 8040 boards, and a couple raspberry pi's, but lets be real...

They're not going to get maximum performance from a nvme disk, the cpus are too slow, and gigabit isn't going to cut it for high throughput applications.

Until manufacturers start shipping boards with ~32 cores clocked faster than 2ghz and multiple 10gbit connections, they're nothing more than a fun nerd toy.

robbiet480 2 years ago

Been waiting for this for over a year, was the first person to buy a pre-purchase sample. Planning to set up a PXE k3s cluster.

pnathan 2 years ago

This looks cool!

I would, however, say that while I'm in the general target audience, I won't do crowdfunded hardware. If it isn't actually being produced, I won't buy it. The road between prototype and production is a long one for hardware.

(Still waiting for a very cool bit of hardware, 3+ years later - suspecting that project is just *dead*)

dehrmann 2 years ago

> 160 ARM cores

> 320 GB of RAM

Depending how you feel about hyperthreading, there are commodity dual-CPU Xeon setups than can do this as well.

bogwog 2 years ago

$60 per unit sounds pretty good. Does anyone have experience cross compiling to x86 from a cluster of Pis and can say how well it performs? A cheap and lower-power build farm sounds like an awesome thing to have in my house.

  • 0x457 2 years ago

    Most likely, your amd64 CPU is much faster than all those tiny Pi cores. Add to that network latency...

    • geerlingguy 2 years ago

      Especially for x86 cross compiling from ARM. Typically people do the reverse, because outside of Graviton and M-series, X86 is generally a lot faster.

      • 0x457 2 years ago

        Yeah, I remember compiling for Pi with QEMU on my amd64 was infinite times faster than compiling on Pi itself. I think people don't understand a few things:

        - running self-hosted services for 3.5 users doesn't take many resources and Pi can often handle multiple services

        - Compilation is CPU heavy and I/O heavy operation, more memory you have on a single machine is better.

        I use Pi as my on-the-go computer in places where I can't ssh to my home-server. Sometimes I can't even get projects indexed without language server being killed by OOM 20 minutes later (on my PC it takes <20 seconds to index).

jdoss 2 years ago

I think these are fantastic, but I really wish it had a BMC so one could do remote management. I'd love for version 2 to have it so I could buy a bunch for my datacenter.

bashinator 2 years ago

There's no backplane - all power and communication goes through a front-facing ethernet port. Kind of defeats the purpose of a blade form factor IMO.

Saris 2 years ago

It's too bad ARM boards are so expensive, it makes them nearly pointless for projects unless you need the GPIO.

robotburrito 2 years ago

This is cool. But it's super hard to compete w/ a computer you bought off craigslist for 25$.

ultra_nick 2 years ago

I'd like to buy a laptop that's also a fault tolerant cluster.

1MachineElf 2 years ago

Love it, however, I'm skeptical of Raspberry Pi Foundation's claims that the CM4 supply will improve during 2023. It might improve for some, but as more novel solutions like these come up, the supply will never be enough.

amelius 2 years ago

How do we measure the performance of these kinds of systems?

  • robbiet480 2 years ago

    The Blade is just a carrier for a Raspberry Pi CM4, so the performance will be that of a normal CM4.

    • amelius 2 years ago

      Ok, still it would be nice to have a line that says this system can do X1 threads of X2 GFLOP/s and has a memory bandwidth of X3 MB/s, or something like that.

      • ZiiS 2 years ago

        Unfortunately if you are asking that question the answer for all the Pi's and clones is "Not enough by more then an order of magnitude".

        • geerlingguy 2 years ago

          The clones based on the RK3588 are approaching last-Gen Qualcomm speeds, so they're not as much of a let down as the 2016-era chips the Pi is based on.

          And efficiency is much better than the Intel or AMD chips you could get in a used system around the same price.

          • 0x457 2 years ago

            "last-Gen Qualcomm speeds" that's a pretty low bar...

            • geerlingguy 2 years ago

              Heh... but at least it's a lot higher than "current-Gen Pi"

      • znpy 2 years ago

        You can look at benchmarks for the rpi cm4 for that

atlgator 2 years ago

Why only 1 Gbps ethernet?

  • geerlingguy 2 years ago

    That's the speed of the NIC built into the CM4. If you want 2.5 or 5 Gbps, you'd have to add a PCIe switch, adding a lot more cost and complexity—and that would also remove the ability to boot off NVMe drives :(

    Hopefully the next generation Pi has more PCIe lanes or at least a faster lane.