IBM Open-Sources Power Chip Instruction Set

587 points by Katydid 6 years ago

bem94 6 years ago

I have so many questions:

- Where can I get the ISA specification?[1]

- Where can I get a compiler?

- Is there a link to the "softcore model"?

With RISC-V you can start very simple and small (micro-controller) and work your way up in understanding and implementation to a very large core (application class). POWER is a monster of an architecture, designed more for "big iron". I guess that might limit the "hobbyist" factor RISC-V has.

1. This I think, all 1200 pages of it: https://openpowerfoundation.org/?resource_lib=power-isa-vers...

kop316 6 years ago

I own a Talos II (https://www.raptorcs.com/TALOSII/) computer. It actually runs an official port of Debian (https://wiki.debian.org/PPC64) on it, which includes a compiler.
- cipherboy 6 years ago
  
  Fedora [0], Red Hat [1], Ubuntu [2], and SUSE [3] all have their own ppc64le ports as well so there are lots of choices out there if anyone is interested.
  Even Gentoo has one [4][5]!
  [0]: https://alt.fedoraproject.org/alt/
  [1]: https://access.redhat.com/documentation/en-us/red_hat_enterp...
  [2]: https://ubuntu.com/download/server/power
  [3]: https://www.suse.com/products/power/
  [4]: https://wiki.gentoo.org/wiki/Handbook:PPC64
  [5]: https://www.gentoo.org/downloads/
  
  classichasclass 6 years ago
  
  Fedora 30 on this Talos II. Works well.
  
  _emacsomancer_ 6 years ago
  
  Void Linux as well: https://www.talospace.com/2019/01/void-linux-goes-power9.htm... Although it's not official at this point I don't think.
  
  classichasclass 6 years ago
  
  No, though my impression is it's progressing pretty well, so I think it will get there.
- voldacar 6 years ago
  
  I have drooled over the Talos II for quite some time...
  Do you have a particular use case that makes POWER make sense over x86, or do you share my paranoia and love of non-mainstream ISAs?
  
  einpoklum 6 years ago
  
  Use of GPUs. Not Talos II it seems (?), but with POWER, GPUs are first-class citizens on the system, with NVLink-2-bus access to main memory as the CPU - 150 GB/sec in each direction! (simultaneously!)
  
  bubblethink 6 years ago
  
  Actual GPU use on Talos seems to be problematic from their wiki page. The CUDA use case is supported, but that bandwidth seems too high. Or are you quoting some future number ? The current bandwidth on a P9 system with nvlink is closer to 30 GB/s. And I don't think Talos supports nvlink.
  
  madez 6 years ago
  
  Are all accesses to the memory from the GPU still checked for permissions at the hardware level by an IOMMU?
  
  rrss 6 years ago
  
  Yeah, checked for permissions in hardware, but not by an IOMMU. Requests from the GPU are forwarded to the "standard" SMMU. See http://www.ieee-hpec.org/2018/2018program/index_htm_files/13...
  
  classichasclass 6 years ago
  
  I don't especially, because my Talos II is "just" my desktop. I want a computer I can trust and that I know what it's doing from the ground up. It was already the best choice for that and today's announcement made the choice even better.
  
  Koshkin 6 years ago
  
  > I know what it's doing from the ground up
  Do you now? There is not even a hidden embedded micro-core running a "secure operating system"?
  
  classichasclass 6 years ago
  
  You can audit the firmware and build it yourself. I did it. Raptor even encourages it: https://wiki.raptorcs.com/wiki/Compiling_Firmware
  The biggest problem remaining is whatever blobs are in devices. That's being rapidly worked on.
  
  nickpsecurity 6 years ago
  
  It's worse than that:
  https://lobste.rs/s/noed0h/day_2_keynote_openpower_blows_doo...
  You can't trust any modern computer to not be subverted. So, you have to change how you use them. True secrets should be kept out of computers or rooms with technology. Go old school.
  
  tempguy9999 6 years ago
  
  > Go old school
  OK. How?
  
  _delirium 6 years ago
  
  Interesting. How do you find it as a desktop? I'd read in reviews that it's incredibly loud, so more suited for datacenter than office or home use, but maybe it's not as bad as I'd gathered?
  
  classichasclass 6 years ago
  
  This is a very early unit (#12) and the initial firmware was indeed deafeningly loud. However, the current firmware is whisper quiet, certainly much quieter than the Quad G5 next to it (and the G5 is throttled down), and I also have super-quiet power supplies installed. I find it perfectly liveable.
  
  avhception 6 years ago
  
  I've had a system with two quad-core CPUs running at 100% load under my desk for many days, whisper-quiet.
  
  neop1x 6 years ago
  
  For me, the biggest problems are the long booting of HosBoot and lack of "suspend to ram"
  
  kop316 6 years ago
  
  I share and respect your paranoia. I have a love of inspecting code and not having backdoors in my processor.
  
  voldacar 6 years ago
  
  For sure. It's nice that open source firmware replacements have been making progress, especially since the Intel ME fiasco, but it never ceases to amaze me that right now you can go out and get a modern, ultra high performance workstation with every single chip running auditable firmware. Hopefully we will start seeing more affordable POWER systems now that it is a fully open architecture
  
  acqq 6 years ago
  
  > with every single chip running auditable firmware
  But disks? Isn’t their firmware closed?
  
  tpearson-raptor 6 years ago
  
  What you can do with a trusted CPU domain is use FDE. FDE is standard practice for anyone even remotely concerned about security in the first place.
  So the firmware that matters -- the firmware that can subvert the system due to privilege level, etc. -- is open. No other vendor aside from some lower end ARM toy SoCs can say that.
  
  justinclift 6 years ago
  
  Maybe OpenSSD would be functional enough to use:
  http://openssd.io
  http://www.openssd-project.org/wiki/The_OpenSSD_Project
  
  throw0101a 6 years ago
  
  > But disks? Isn’t their firmware closed?
  Encrypt your data in-memory with a file system feature (or something like LUKS/dm-crypt) before it's sent down the SATA cable to the disk.
  The NSA has gone after disk firmware:
  * https://www.theregister.co.uk/2015/02/17/kaspersky_labs_equa...
  
  voldacar 6 years ago
  
  Ugh I guess there always has to be an exception. Maybe you could run everything in a ramdisk? It supports up to 2TB of ram after all
  
  marmaduke 6 years ago
  
  Isn't the price tag pretty amazing too?
  
  voldacar 6 years ago
  
  Well yeah it is hardly cheap but for the people who can afford it, I totally get how having a computer you can trust is worth the price tag
  
  Annatar 6 years ago
  
  Most people won't pay that kind of money just to tinker with it.
  
  wolfgke 6 years ago
  
  > Most people won't pay that kind of money just to tinker with it.
  The interesting question rather is: how many of these simply cannot afford it and how many think that this is not worth it?
  
  marmaduke 6 years ago
  
  I looked at the Talos website. They're asking 2-4k$ for a 4 core (4 way SMT, so let's say 8 core when comparing to x86_64 just to be nice) dev desktop, with 8 to 16 GB ram. The same spend on a Dell Xeon workstation nets quite a bit more hardware.
  
  avhception 6 years ago
  
  While not exactly cheap in absolute terms, there is the Blackbird from Raptor. It's a single-CPU board, cheaper than Talos.
  
  shaklee3 6 years ago
  
  I wouldn't drool over it. See the latest benchmarks comparing it to epyc and Intel. Power9 does pretty poorly throughout almost every test:
  https://www.phoronix.com/scan.php?page=article&item=rome-pow...
  
  tpearson-raptor 6 years ago
  
  Both of those processors insist on you ceding full system control to the vendor in perpetuity, with a literal "skeleton key" that let's the vendor in and keeps you out (the centrally signed, unremovable ME/PSP). If this doesn't concern you, then why are you looking at a local machine at all when a cloud system may very well be less expensive to lease than to purchase and keep current, not to mention run, local hardware? Unless you're loading the local machine 24/7, you're leaving a resource sitting idle for parts of the day without any real increase in security or control, meaning the cloud vendor can give you a cheaper experience overall by keeping hardware utilization over time high.
  And no, ME cleaner does NOT (and cannot) fully remove a modern ME. The PSP "disable" toggle in the UEFI configuration does NOT disable the PSP from running during startup.
  
  wolfgke 6 years ago
  
  > why are you looking at a local machine at all when a cloud system may very well be less expensive to lease than to purchase and keep current, not to mention run, local hardware?
  Because a cloud machine is rented and not owned. And because of the ping latency: there is a reason why there is for example still hardly any cloud gaming.
  
  tpearson-raptor 6 years ago
  
  And what exactly do you call a machine that you are, by design, cryptographically locked out of, but a third party has access to?
  Put another way, would you call a car that I kept duplicate keys and retained title for, but said you could use and maintain at your sole expense for a single upfront payment, rented or owned?
  Latency is being solved, Google etc. are working that problem. I'm playing devils advocate here, but fundamentally if you don't care about actually controlling or being able to modify something, and pricing is cheaper to rent, why own?
  
  wolfgke 6 years ago
  
  > And what exactly do you call a machine that you are, by design, cryptographically locked out of, but a third party has access to?
  Not a perfect solution, but such a problem can be mitigated by a firewall that blocks such ingoing/outgoing packets.
  > I'm playing devils advocate here, but fundamentally if you don't care about actually controlling or being able to modify something, and pricing is cheaper to rent, why own?
  Since I love to tinker with my computers, the answer is obvious to me.
  
  shaklee3 6 years ago
  
  I think this is apples and oranges. Sure, if those kinds of things are that important to you, then POWER9 is your only option. But if performance is important, POWER9 is a longshot from being the best. Most companies likely don't care about the things you're suggesting.
  
  beezle 6 years ago
  
  My understanding is that it is not really useful to compare Power9 to other cpus in these types of benchmarks, that Power9 is all about computation with massive datasets, not how fast it can zip a file.
  
  Recurecur 6 years ago
  
  > My understanding is that it is not really useful to compare Power9 to other cpus in these types of benchmarks, that Power9 is all about computation with massive datasets, not how fast it can zip a file.
  Your understanding is wrong. For instance, running Java workloads on servers is a major Power9 use case.
  The thing to remember, though, is that the Talos is only two four-core CPUs, for eight total. These benchmarks are comparing it to the Epyc 7742, which is a 64 core chip.
  Naturally the Epyc will kill it on most highly threaded benchmarks. The individual cores on Power9 are quite fast, though.
  
  fluffything 6 years ago
  
  > The individual cores on Power9 are quite fast, though.
  Are there any benchmarks for single thread performance there that I could see ?
- mshook 6 years ago
  
  If I may ask why did you get it, which reasons? I like the cool non x86 factor but it's quite expensive...
  EDIT: Forgot to mention the open argument which is quite amazing as well (I've followed what Talos does).
  
  kop316 6 years ago
  
  I tend to buy server level hardware for my own usage. It tends to last a LOT longer. With that in mind, it was roughly comparable to what I would have paid for a comparable Intel Xeon, and I like the fact that I know all of the code that runs on it (the only code that I can't actually change is the OTP memory that it first executes when it boots up, and even then you can inspect it!).
  
  Avamander 6 years ago
  
  Thanks for supporting the development of open hardware and software. Very few of us can afford to do so.
  
  kop316 6 years ago
  
  They actually came out with a Blackbird, and I have been considering getting one of them to replace my server (it runs FreeNAS, but I have gotten my Debian system to run an encrypted ZFS drive).
  
  dragontamer 6 years ago
  
  The 18-core has decent performance. If it weren't for AMD EPYC Rome chips coming out a month ago, I would have considered a Talos II.
  18-cores with 4x SMT == 72 threads per Power9. That's a lot of threads, no matter how you look at it.
  
  slovenlyrobot 6 years ago
  
  That's a whole lot of SMT. Can anyone comment on how it behaves compared to hyperthreading? I'm assuming at 4x each core must have a ton more execution units to go around
  
  dragontamer 6 years ago
  
  Power9 is basically "Bulldozer done right". Each SMT4 Power9 core is incredibly fat, with 4x load/store units 4x ALUs, 2x Vector units. Bulldozer probably would have called each SMT4 core a collection of 4-cores.
  But only 1x divider, 1x crypto unit per SMT4 core.
  The chief downside to Power9 is that it only supports 128-bit vectors, and these 128-bit vectors are executed by ganging-together the ALU units. (so 4x 64-bit ALUs == 2x 128-bit vectors processed per clock tick). Compared to AMD Zen (4x 128-bit pipelines), AMD Zen 2 (4x 256-bit pipelines), and Intel Skylake-X (3x 512-bit pipelines), Power9's SIMD capabilities are tiny.
  Another oddity: most instructions take 2-clock ticks to execute, even simple instructions like XOR or Add. This increased latency is likely the reason why it performs so poorly with Python / PHP code.
  But when code is written for Power9, it works quite well. Stockfish chess seems to work extremely well on Power9, likely because Stockfish scales to many "cores" well (fully taking advantage of SMT4), and only has 64-bit operations.
  One more wildcard: Power9 has 10MB (!!!) L3 cache for every 2-cores. That's 90MB L3 cache on the 18-core. I presume that real-life database applications would benefit greatly from this oversized L3 cache.
  EDIT: It should be noted that the L3 caches serve as victim-caches of other L3 caches. So Power9 core-pair 01 can have its 10MB L3 cache serve as a "L3.1 cache" of core-pair 23. AMD Zen / Zen2 L3 cache CANNOT use this functionality. So AMD Zen2 64-core may have 128MB of L3 cache, but each core only "really" can go up to 16MB of L3 cache (because the other 112MB of L3 cache is only for other cores/module)
  EDIT: Also note, Power9 came out a few years ago at 14nm, while Zen2 came out on the 7nm node a month ago. I think a new 7nm Power9 update is planned, but I don't know what its timeframe is.
  In effect, you could have 1-program using the entire 90MB L3 cache for itself on Power9. While AMD Zen2 requires (at minimum) 8-programs, each program using only 16MB L3. This design decision is clear in the intended use of the chips: Zen2 is clearly targeted at the cloud-market, while Power9 is big-iron / databases.
  --------
  Unfortunately, most of the benchmarks these days show that AMD EPYC / Rome is just the better overall processor. Still, 18-core Power9 is relatively cheap: a complete 18-core / 72-thread system for $4000ish: https://secure.raptorcs.com/content/TLSDS3/purchase.html
  Cheap for Power9 anyway. AMD EPYC is also relatively cheap. You can get a 16-core / 32-thread / 32MB L3 cache AMD Ryzen 9 3950x for only $700 these days (and maybe a complete system build for only $2500).
  
  phire 6 years ago
  
  I don't think "bulldozer done right" is the correct way to describe POWER9.
  I see it more as a single big massively wide OoO core with 23 execution units (putting skylake's 10 execution units to shame). The slices are more there for design reasons, to simplify the design process by making it more symmetrical.
  Bulldozer is clearly two integer cores sharing some execution units between them, a thread can only exist on one of the two integer units.
  In contrast, a thread on POWER9 can simultaneously use all 4 slices, all 23 execution units. The dispatcher can dynamically mix and match which slice it's sending a threads instruction steam to based on slice utilization.
  That single difference puts it in a complete different class of CPU architecture to bulldozer.
  
  dragontamer 6 years ago
  
  > In contrast, a thread on POWER9 can simultaneously use all 4 slices
  My reading of the documentation is different.
  > The most significant partitioning related to threads occurs when more than two threads are active, placing the core in SMT4 mode. In SMT4 mode, the decode/dispatch pipeline, shown in the blue shaded area in Figure 25-1 on page 321, is split into two pipelines, each pipeline is three iops wide and each pipeline serves two threads. The split decode/dispatch pipes each feed one of the two superslices, shown in the green shaded box in Figure 25-1, providing two execution slices for each pair of threads. The branch slice and LS-slices are shared between all threads.
  Page 322 of 496: https://ibm.ent.box.com/s/8uj02ysel62meji4voujw29wwkhsz6a4
  -------
  The left superslices serves 2-threads, while the right superslice serves 2-threads. All 4 threads are "behind" the singular decoder.
  It seems very "Bulldozer-esque" to me, especially in SMT4 mode.
  ---------
  You are correct in that there is an SMT1 mode where one-thread could potentially utilize the entire processor. But with 2-latency on even Add / XOR instructions (see Appendix A), I don't foresee SMT1 code to be very useful on Power9. The processor is clearly designed to run most effectively on SMT2 or SMT4 modes.
  I'm not even sure how easy or hard it is to switch into SMT1 to SMT2 or SMT4 modes. I don't think Linux can switch cores while running, and may need to reboot for instance. Maybe AIX can switch between the modes on the fly?
  I guess if your code has enough Instruction Level Parallelism (ILP) available in its code stream, it could benefit from SMT1 mode. But I'd imagine that most 64-bit CPU-code wouldn't have much ILP.
  
  phire 6 years ago
  
  It's worth noting that in SMT2 mode, it's still 2 threads dynamically scheduled across all 4 slices.
  It's only in SMT4 mode that it starts statically partitioning the threads onto superslices. Even then, it's two threads sharing two slices.
  I assume the static patitioning is an optimisation, that preformance increases due to the split L1d caches (and I'm guessing there is a delay cycle when one slice depends on data from another, I haven't read the documentation that closely).
  It's the fact that slices can be dynamically scheduled across all four slices which makes it "not bulldozer" in my mind, and I don't think the presence of a mode that does statically partition superslices should make it "like bulldozer", even if that is the most common mode. It's just an optimisation.
  > I'm not even sure how easy or hard it is to switch into SMT1 to SMT2 or SMT4 modes.
  Idealy, the CPU core would dynamically drop down to SMT1 or SMT2 mode whenever the the extra threads are executing idle instruction.
  
  dragontamer 6 years ago
  
  > It's the fact that slices can be dynamically scheduled across all four slices which makes it "not bulldozer" in my mind, and I don't think the presence of a mode that does statically partition superslices should make it "like bulldozer", even if that is the most common mode. It's just an optimisation.
  Well, its certainly a Bulldozer-like mode of operation :-)
  Power9 is obviously a very different chip than Bulldozer. So I guess it all comes down to opinion, whether or not the chip is similar enough to warrant a comparison.
  
  ivl 6 years ago
  
  > EDIT: Also note, Power9 came out a few years ago at 14nm, while Zen2 came out on the 7nm node a month ago. I think a new 7nm Power9 update is planned, but I don't know what its timeframe is.
  I believe 7nm POWER10 will be the next move, they had announced Samsung as the partner for their next chips back in December if I remember right.
  
  shaklee3 6 years ago
  
  The power 9 deceptively "came out a few years ago". But in reality, it didn't. The only ones available for a year or so we're demo units at IBM. The rest were being promoted as part of the summit supercomputer. Just like AMD's MI50/60 has been available since November 2018. But try to search/buy one. Good luck...
  
  cptnapalm 6 years ago
  
  I have a 2009 Mac Pro with dual 3.2 GHz hexcore Xeons (so 24 threads) and 2 older GPUs and 48 GB RAM that cost less than $700 for the whole thing. I'm beginning to think I lucked out on it more than I already thought I did.
  
  dragontamer 6 years ago
  
  Each Nehelem hexcore Xeon is (EDIT) ~120 Watts of power, so your computer will be drawing well over 300W under load, maybe over 500W. (I mean, Mac Pro 2009 has a 1200W PSU. I presume its expecting to use around half of that power)
  The Power9 18-core / 72-thread is going to come in at under 150W total.
  The main advancement the past decade has been in power-efficiency. Cloud-scale providers keep their computers running at max load as well, so 500W does add up over months / years into a sizable amount of money.
  Especially when you consider that 500W computer needs 500W of Air-conditioning, so the "True cost" of a 500W computer is roughly ~1200W or so (500W from the computer, 700W to power an air-conditioner to move 500W of heat)
  ----------
  A 12-core / 24-thread AMD Ryzen 3900x is just $500, with a total system cost under $1500. The big advantage of a Ryzen 3900x would be a max clock-rate of 4.7 GHz, while your Nehelem 2009 computer is... what? 2.5 GHz? Probably? And computers of that age didn't have deep sleep capabilities, wasting even more power than usual. Modern computers idle at 20W, even servers and desktops. Tons of power-saving features these days which add up.
  I think a typical $1500 computer these days would be more than twice as fast with 1/4th the power usage. I don't think anybody seriously in this hobby should be using anything as old as Nehelem these days.
  IMO, the price/performance "old computers" seems to be Haswell (~2014 era servers), if people want to buy old equipment. But 2009 is definitely too old, there are lots of used servers that are a little bit more expensive but a LOT more power efficient / faster in practice.
  
  gpm 6 years ago
  
  > Especially when you consider that 500W computer needs 500W of Air-conditioning, so the "True cost" of a 500W computer is roughly ~1200W or so (500W from the computer, 700W to power an air-conditioner to move 500W of heat)
  I thought air conditioners/heat pumps were supposed to be substantially better than 1w of heat moved outside per watt of electricity?
  
  dragontamer 6 years ago
  
  Hmm... a typical home Air Conditioner is 15 to 20 SEER, which apparently stands for 15 BTU/hr per Watt.
  15 BTU/hr == 5 Watts of cooling per Watt of input.
  So it appears you are correct. To move 500W watts of heat, you only need 100W of air conditioner power.
  
  cptnapalm 6 years ago
  
  The Mac Pro is a 4,1 flashed to a 5,1 and uses Westmere 3.3 GHz CPUs, but your point of power consumption is taken. As I can't possibly afford a $1500 PC, I'm still happy with what I've got. A multiseat desktop/server I could afford that is pretty happy with whatever I've thrown at it is a lot better than a bare CPU sitting idly on my desk.
  
  dragontamer 6 years ago
  
  If $600 or $700 is your budget, my main point was to look for Haswell (2014-era) systems.
  For example, the Dell PowerEdge R630 (2014-era) server is in and around $600 to $1000 on Ebay, and will be more power-efficient and faster than any 2009-era system.
  I think 2014-era servers are where the price/performance point is for the home-server enthusiast, especially if we're talking about sub $1000 price points.
  https://www.ebay.com/itm/Dell-Poweredge-R630-2x-Xeon-E5-2640...
  2x8 core dual socket Intel Xeon E2640 v3 (Haswell) with 64GB of RAM. Its an auction, so it will probably go up another $100 or $200 from there, but I would expect it to sell well south of $1000.
  2014-era equipment is the current price/performance king for home hobbyists. Obviously, a modern desktop with all the bells and whistles is a bit more expensive at $1500, but for $6oo to $700, you can get a pretty good 2014-era system.
  -------
  My rule of thumb is to buy something 5-years out of date. That's roughly the time when businesses get rid of old equipment and upgrade. So 5-years old equipment tends to win in price/performance.
  
  cptnapalm 6 years ago
  
  I did, oddly enough, look at used PowerEdge servers, but I wanted a multiseat desktop too, so the step-son and I could play games together at the same time. Less than $700 bought the Mac Pro, 2 video cards, 48 GB of RAM (3 sticks) and, not included in my original equipment tally, a 4 TB SSD and 24" AOC monitor. The bare Mac Pro was $250. As I got it early last year, the 5 year rule of thumb almost applied as a 2009 and a 2012 Mac Pro are nearly identical, the former being able to just be flashed to the latter. In another couple of years, if I have any cash to spare, I'll likely get a used PowerEdge, though. The cost of those things, for what you get, is exceedingly good.
  
  dragontamer 6 years ago
  
  Ah right, multiseat desktop.
  Well, I guess the Mac Pro is fine for that, as long as you're fine with the Mac OSX operating system. The Mac Pro line hasn't really had many updates, so maybe the 5-year heuristic doesn't really apply.
  
  cptnapalm 6 years ago
  
  Linux all the way! OSX doesn't actually do multiseat. So, have a Linux Mac Pro that I can ssh into, or if that's blocked, get a shell or even my desktop in a web browser among other things. All in all, rather happy with it, though I really would like one of those Raptor Power9 boards for the hell of it.
  
  jsjohnst 6 years ago
  
  > The Power9 18-core / 72-thread is going to come in at under 150W total.
  The TDP on the 18 core (and 22 core as well) is 190W as listed on Raptor’s website.
  
  CrystalGamma 6 years ago
  
  That's an IBM TDP (i. e. maximum ever power), not an Intel TDP (i. e. maximum power at some arbitrary power state declared as 'base clock speed').
  
  zrm 6 years ago
  
  It's going to be highly dependent on the workload. For some it's counterproductive because the working set of fewer threads will fit into a given cache level when more threads won't, and then it slows things down -- but then you can turn it off or run fewer threads per core.
  Where it's a big win is for pointer chasing workloads or big databases, where the working set isn't going to fit in cache anyway and then it's effectively like having really fast context switches. You have four threads and three of them are waiting on main memory while you keep the core busy with the fourth, then that thread has a cache miss but by then one of the other threads has the data it was waiting on.
  
  slovenlyrobot 6 years ago
  
  That pointer chasing benefit has been my experience on Intel, especially on anaemic low power designs. I'm more curious how/why Intel stops at 2 whereas Sparc/Power can manage much higher numbers. Maybe it's not architectural, but more just about product fit or something
  
  zrm 6 years ago
  
  It's probably a combination of target market and trade offs.
  To make SMT-4 perform well you want to have larger caches so that cache contention between the threads doesn't become the bottleneck, but that eats a lot of transistors. It's essentially a brute force trade off between performance and manufacturing cost and IBM is more willing to say "damn the cost" than Intel.
  There's also the matter of who needs a machine like that. There is a lot of ugly pointer-chasing code in the world, but to take advantage of SMT-4 it has to be well-threaded ugly pointer-chasing code. You basically need a customer that needs their application to scale and is willing to do the bare minimum necessary to make that possible, but not spend a lot of resources actually optimizing the code once they get it to the point that throwing more hardware at it is a viable alternative. That's the enterprise market in a nutshell right there, and that's where IBM lives.
  
  stingraycharles 6 years ago
  
  That sounds fascinating. Do you have any examples / study material that describes these programming techniques?
  
  slovenlyrobot 6 years ago
  
  My hyperthreading enlightenment came from discovering a parallelized XML parsing task (using libxml2) running on Atom N2800 (2 cores) absolutely trouncing a similar run on a much beefier Xeon with HT disabled. It came very close to a 2x speedup.
  This is what the parent comment means when referring to pointer chasing -- XML documents are a big random access graph in memory, CPU cache and prefetch is close to useless in that environment, so when walking the DOM as part of some parsing task, much of the time is spent waiting on memory, with the execution units lying idle.
  OTOH many 'genuinely computational' jobs like say, an ffmpeg encode have very noticeable slowdowns with hyperthreading enabled. In those kinds of jobs where the code is already highly optimized to keep the CPU pipeline busy, there will be contention for the single set of execution units shared by both threads, and so the illusion is destroyed.
  As to why it results in a measurable slowdown, someone else would need to answer that, but it is at least conceivable that software overheads to manage the increased task partitioning might account for some of it
  
  flukus 6 years ago
  
  > This is what the parent comment means when referring to pointer chasing -- XML documents are a big random access graph in memory, CPU cache and prefetch is close to useless in that environment, so when walking the DOM as part of some parsing task, much of the time is spent waiting on memory, with the execution units lying idle.
  Bare in mind that this is only true if you parse with the DOM model, if you care about efficiency and it's at all possible then the SAX model is much faster, you won't be bound by pointer chasing as there's very little in memory at once. IME the next big gain comes from eliminating string comparisons with hash values. By that point xml parsing is entirely limited by how fast you can stream the documents.
  
  slovenlyrobot 6 years ago
  
  You can achieve a similar (although I guess not nearly as efficient) effect with DOM, without sacrificing convenience given a suitable library. For example the Python lxml library grants access to the tree as it is being constructed, if you are careful not to delete a node it will later modify, it's entirely safe to e.g. parse one element at a time from a big serialized array, then deleting the element from its parent container, so memory usage remains constant. By the end of the parse, you're left with a stub DOM describing an empty container.
  The advantage is not losing access to lovely tooling like XPath for parsing
  (If anyone had not seen this trick before, the key to avoid deleting elements out from under the parser is to keep a small history of elements to be deleted later. For an array, it's only necessary to save the node describing the previous array element)
  
  sbierwagen 6 years ago
  
  I'm not sure I would describe an IO-bound problem as "genuinely computational".
  
  imtringued 6 years ago
  
  Video encoding is one of the most CPU intensive problems that your average user will encounter.
  
  ddorian43 6 years ago
  
  https://m.youtube.com/watch?v=j9tlJAqMV7U
  This is an extreme version of yield on memory access
- jammygit 6 years ago
  
  Have you had any issues with it, or has it required any additional configuration? I'm extremely curious about the real world use of Power chips
- martin1975 6 years ago
  
  can you run AIX on it?
  
  CrystalGamma 6 years ago
  
  No. AFAIK AIX only runs on PowerVM systems, which none of the OpenPOWER systems are.
  
  Annatar 6 years ago
  
  Then what's the point if I can't run AIX on it?!?
- Annatar 6 years ago
  
  "Talos™ II 2U Rack Mount Server TL2SV1 Talos™ II 2U Rack Mount Server Starting at $6,089.00"
  Not at $6,089.00; they can forget that. It has to cost no more than $500 USD or this will be a repeat of the same mistake Sun Microsystems did. Will these companies ever learn?
  One cannot charge enterprise prices if one wants to build an upward spiral. Intel systems dominate because they are dirt cheap and convenient to buy.
  
  fluffything 6 years ago
  
  You can't find a modern Intel Xeon Gold CPU for less than 2000$. If you buy 2, and a motherboard, you are already in the 6000$ ballpark, and then you still need to buy everything else (PSU, RAM, SSDs, GPGPU, etc.).
  
  Annatar 6 years ago
  
  I can build (and have) a fully decked-out intel-based 1U server for $1,800 USD, so this Talos thing can't compete: it's not cost-effective no matter how one slices and dices it.
  This company is repeating the same mistake IBM, hp, SGI and Sun before it made.
  Those who do not learn from history are doomed to repeat mistakes of those who came before them.
  Have you bought one of those Talos systems?
  
  fluffything 6 years ago
  
  I'll believe you if you are able to provide a link to two Xeon Gold CPUs costing the same or less than the 1800$ you claim you are able to build a full 1U rack with two of them.
blattimwind 6 years ago

PowerPC and POWER are relatively mainstream. It's supported by IBM XL, GCC, Clang and most major JITs (including luajit).
- squarefoot 6 years ago
  
  Exactly. I can't speak for big iron and server usage because last time I used a POWER based server at work it was still AIX restricted (though IBM was already aiming at Linux for the future), however on PPC about everything user level was available 15 years ago, including USB devices support and compilers. When the very first PPC Mac Mini came out I purchased one to be used as a living room media PC connected to a projector and running a customized Debian which would load a media player (Freevo) just after boot. Worked like charm for years, no complains at all save for the atrocious loud "boonnnngggg" sound at power up that I was never able to turn off:^)
- floatboth 6 years ago
  
  > including luajit
  Well, a fork of luajit. LuaJIT proper has been abandoned for months…
  
  ksec 6 years ago
  
  I thought somebody picked up as new maintainer and then Mike Pall was back. What happened? He lost interest?
  
  jpfr 6 years ago
  
  No new commits since January.
  https://github.com/LuaJIT/LuaJIT
floatboth 6 years ago

> all 1200 pages of it
Sounds weak, one of the versions of ARMv8 has a spec that's exactly 6666 (!) pages. I would expect IBM to be more detailed lol
- mhh__ 6 years ago
  
  Spec or manual? The ARM Spec I've seen is also 1200ish pages whereas the programmers manual is indeed thousands of pages
  
  floatboth 6 years ago
  
  Ah, they indeed have a shorter spec, but it's at 2611 pages now https://static.docs.arm.com/ddi0596/d/ISA_A64_xml_v85A-2019-...
  It's actually a document generated from machine-readable XML files https://alastairreid.github.io/ARM-v8a-xml-release/
- CrystalGamma 6 years ago
  
  IBM doesn't do wacky stuff like Pointer authentication that creeps into every corner of the spec, making everything more complicated …
ajross 6 years ago

> POWER is a monster of an architecture
FWIW: the original RS/6000 devices were 20-40 MHz in-order CPUs with architectures objectively simpler than a RISC-V microcontroller like the E310.
shawnz 6 years ago

> POWER is a monster of an architecture, designed more for "big iron".
It's the same architecture as PowerPC, designed for desktops, isn't it? Have things really changed so much since then?
- floatboth 6 years ago
  
  Yes, in fact if you run FreeBSD on POWER9 currently, it's compiled with ancient gcc 4.x.whatever (the last GPLv2 version) :D (The switch to clang and ELFv2 ABI is going to happen in the coming months)
- gpderetta 6 years ago
  
  IIRC PowerPC as a separate architecture doesn't exist anymore. All extensions were folded back into POWER.
- yellowapple 6 years ago
  
  I wouldn't be surprised if they have, given that PowerPC desktops haven't been mainstream for more than a decade now, and in the meantime IBM's servers have been marching on.
  
  close04 6 years ago
  
  The last truly mainstream PowerPC desktop was Apple's 2005 Quad-G5 PowerMac. There were other PowerPC machines after this, the PS3 being the most notable. But they were either not desktops or not mainstream.
  I'm a hardware nostalgic and have both gathering dust in my basement. So I can't wait for a PowerPC revival of any kind.
  
  yellowapple 6 years ago
  
  Yeah, I've got my share of PowerPC Macs, too (one Powerbook G4, one PowerMac G5, one XServe G5, one eMac G4, all running various versions of OpenBSD). They're really fun machines, and it's a shame Apple decided instead to be yet another x86 vendor.
  I also can't wait for a PowerPC revival. Saving up for one of them Talos workstations as my next major hardware purchase (but it's really hard to pull the trigger when the motherboard or CPU alone costs as much as I paid for the entire Threadripper rig I built last year...).
  
  floatboth 6 years ago
  
  Raptor Talos/Blackbird is a niche, expensive revival, but a revival nonetheless :)
  
  close04 6 years ago
  
  I may have been overly generous with the "of any kind" :). That's a bit on the expensive side and the ecosystem and platform flexibility in terms of upgrade are still pretty slim/locked in. Something that's open and cheap enough to spark general interest would be much more interesting.
  
  yellowapple 6 years ago
  
  The Blackbirds don't look too expensive, and Talos' hardware in general is about as open as it gets (putting even the x86 market to shame, let alone the PowerPC Apple desktops).
  
  classichasclass 6 years ago
  
  It's already here. Get yourself a Blackbird (or a Talos, if you're really going to jump in).
  
  yellowapple 6 years ago
  
  Out of curiosity: is there any limit on the CPU I can stick in one of the Blackbird boards (i.e. can I stick with the lower-end CPU for initial purchase and upgrade to the 22-core monstrosity later)? If so, then that might push me over the edge into investing my next paycheck ;)
benchaney 6 years ago

In response to your second question, I believe that gcc and llvm both support power.
ajdlinux 6 years ago

The toy soft-core VHDL model that is referred to there will be available at https://github.com/antonblanchard/microwatt at some point in the next couple of days.
circuit 6 years ago

- Where can I get a compiler?
PGI has a free POWER compiler https://www.pgroup.com/products/community.htm
- zie 6 years ago
  
  LLVM and GCC both support POWER.
ksec 6 years ago

And more questions, Correct me if I am wrong,
So this is opening up of POWER ISA, since there is quite a few different version or Revision of that, I assume that is the one beings used in POWER9 and in the future POWER10?
And it is more like RISC-V ISA open source rather than MIPS open source ?( I believe POWER was previously opened but with Cooperate Protection speak all over it ).
And this does not include Implementations, like POWER9?
I mean, if all of these were true, without implementation , or at leats licensing it for cheap, it still doesn't change the market one bit.

cipherboy 6 years ago

I'd like to recommend the friendly people at Oregon State University Open Source Labs [0] who host POWER resources for open source projects. If you're looking to see what the ISA can do on P8 or P9 system, I'd definitely contact them and see if you can get a VM.

There's also a cool vector library [1] that bridges the gap between different versions of the ISA and different compiler versions.

[0]: https://osuosl.org/services/powerdev/ [1]: https://github.com/open-power-sdk/pveclib

tpearson-raptor 6 years ago

Shameless plug, but you can so grab a POWER9 micro VPS (and large ones too) without any human intervention at integricloud.com . Those are commercial / paid though, not free.
ecnahc515 6 years ago

As someone who previously worked as a student at the OSUOSL, thanks for promoting it!
fluffything 6 years ago

Note that some of these also have nVidia GPGPUs, so you can test your open source software on both.

andyjpb 6 years ago

An open, high end CPU design is really going to change the cloud market. An ISA like this is a first step in that direction.

Facebook and Google already have their own compute projects and, like Amazon, have access to custom versions of silicon from a variety of vendors.

With a properly open CPU design we'll start to see the first tightly integrated, vertical "cloud" products that maybe still have a "commodity" API on the top (or maybe not?) but are custom all the way down from there.

With the end of Dennard Scaling, if not Moore's Law, Open ISAs and Open CPU designs will radically change both the hardware and compute markets and ecosystems over the next 5 to 15 years, similar to what we saw with Open Source in the 1990s.

Of course, it's not clear that POWER will be the one to do that, and RISC-V isn't going to be making a grab for Intel's crown any time soon, but this looks like IBMs bid to lead in that area.

When the cloud vendors start building systems like this they'll not look too much different from mainframes and IBM wants to continue to own that market.

mlyle 6 years ago

It's a far, far cry from an open ISA to having multiple competing vendors, let alone open CPU designs.
It was much earlier, but OpenSPARC's impact was limited-- and that was full RTL.
If POWER is open, does anyone really want to make competing high-performance designs-- let alone open them? Better to take something like RISC-V and come up with the first high performance design.
This is especially true when you consider IBM's vertical integration: IBM is the only real POWER OEM and the only real POWER semiconductor vendor.
(If we really assume a reduction of innovation in processors, and a 15 year time horizon... expiration of IP becomes a significant factor, too. Why not just make generic ARM?)
- Annatar 6 years ago
  
  "Better to take something like RISC-V and come up with the first high performance design."
  The problem is that RISC-V mnemonics and programming model is so retarded (as compared to MC68000 or UltraSPARC) that one needs a compiler to abstract and hide that mess away. The other problem is that in several years in which RISC-V has been hyped, nobody came up with a 19" rack server design, let alone sold one priced competitively with a 1U P. C. tin bucket server. RISC-V is all hype, but without serious hardware, its impact will be and remains questionable at best.
  
  nickik 6 years ago
  
  People have made really fast implementations of RISC-V and universally praised it as being very nice.
  And that a ISA that is that knew doesn't have of the shelf server, has nothing to do with the problems of the ISA but rather making mass-market produces for new ISA is incredibly difficult.
  RISC-V has barley out of the lab for a couple years and the growth of software and hardware has been impressive so far. Saying it is 'all hype' is serious nonsense and speaks more about your expectations then RISC-V-
  
  Annatar 6 years ago
  
  I should hope it speaks of my expectations: can't run server workloads on it, worse to program for than OpenSPARC or M68000. I actually want a nice processor and server hardware to use it in to do work. RISC-V ISA and the hardware around it provide neither and yet here we are, it's constantly being paraded as the non-plus-ultra of central processing units.
  
  pjc50 6 years ago
  
  > mnemonics and programming model is so retarded
  Could you provide some examples instead of a slur?
  
  mlyle 6 years ago
  
  First, it's not like the objection even matters: how nice the assembly interface is doesn't really matter for adoption at all.
  And it's not too bad; it's basically very close to a modernized MIPS. There are legitimate complaints, though.
  Probably the most controversial is that integer divide by zero can't be made to raise an exception.
  Similarly, omitting condition codes is something that will be distasteful to many.
  Also, there are so many combinations of legal instruction subsets that compatibility may suffer. Most everything is in a large set of optional extensions (and some important optional extensions aren't really finished yet).
  
  Annatar 6 years ago
  
  move dst, src, src -- I could stop right here, but wait, there is more!
  lui, auipc -- because two instructions are better than a simple move.b or move.w. Really, what nonsense.
  sx, ux - I'm speechless at that nonsense.
  bltu, bgeu -- because blt and bge just weren't enough -- who designs a processor like this?
  lb, lh, lhu, lbu, sltiu instead of move.b, why? I challenge the sales pitch of making more nonsensical instructions amounting to a simpler processor design! (Boy does this make me mad.)
  It's not a slur, it really is utterly retarded, especially if one used to program an elegant microprocessor like the UltraSPARC or the Motorola 68000; even the MOS 6502 is more elegant.
  But to each his own, live and let live, right? Well why then must this botched processor constantly be sold and paraded as the greatest thing since sliced bread, a non plus ultra of processors, when it isn't?
  
  DanBC 6 years ago
  
  Plenty of HN readers have children with severe learning disability. Using the word "retard"[1] is likely to attract downvotes.
  [1] Unless you're talking about progress or watch mechanisms.
  
  Annatar 6 years ago
  
  That's exactly what I'm writing about, progress. RISC-V is not an advancement. What is opposite of advancement? In a system, it's either regression or retardation.
  And expecting people outside of the Puritan U. S. to abide by the same political correctness norms is extremely rude, inconsiderate and exclusionist -- using those same politically correct norms no less, which is to say, the U. S. should ban political correctness, and do so yesterday for the benefit of everyone.
  
  DanBC 6 years ago
  
  I don't care what words you use. I'm just telling you that when you describe people as retards you're going to get downvotes, and I'm telling you why that is.
  I'm not American and I don't live in the US.
  
  Annatar 6 years ago
  
  I didn't describe people as retarded, but their work. Even very smart people often do dumb things.
  
  DanBC 6 years ago
  
  When you say things like this...
  > mnemonics and programming model is so retarded
  ...you are going to get downvoted. This is because people who speak English as a first language understand you to mean "this is stupid, like a retard". They don't understand you to mean "this is delayed, like a watch mechanism would be adjusted".
  You can keep arguing that you didn't mean what you said, but at least two people are telling you how your words are being interpreted.
  
  Annatar 6 years ago
  
  ...you are going to get downvoted.
  I would be a sad excuse of a being if I feared what some people on a random forum will think of me, or "downvote" me in some arbitrary, imaginative system. The entire thing is a delusion.
  Not singling out anyone in particular but I'm a formed adult and have been for several decades, and I do not require upbringing, id est, anyone telling me how to behave or what not to write.
  I will write it how I want and I shall not fear arbitrary decisions based on some arbitrary policies someone somewhere thought up. If that gets me down-voted or even banned, I will not let it bother me, as life does not revolve around arbitrary websites trying to tell one how to behave and think and I will damn myself into oblivion before I allow someone to impose such a thing on me. Lest we forget: I'm the only one who decides that, and I'm not allowing anyone to control my thinking or writing.
pjc50 6 years ago

"Tightly vertically integrated" and "open" are somewhat at odds with each other.
I think far too many people seem to think that the instruction set is something you can just drop in to a chip and start stamping it out, without any appreciation for the amount of device-specific engineering that has to happen. The reason things like a "true open source" Raspberry Pi haven't happened is the $5m - $10m of work required. And for high end devices that would be required to be competitive in the cloud, that number goes up a lot.
I've not heard of Facebook, Google or Amazon doing significant custom silicon projects themselves, as opposed to just working with vendors for some customisation. The only FAANGM in that space are Apple.
IBM are the like the pastoralists living in the ruins of Rome in ~1000AD. They're a consulting firm with a grand name and history.
verall 6 years ago

I'm not sure about this - there are many open processor designs in academia if a fb/google wanted to pick them up - the difficulty is integration and software. They could easier just work on ARM, the reference designs are available if you are fb or google.
I guess what I'm saying is, even if a reletively modern, 2-issue, OoO, with SMT and 256b vector proc, came out open source, would anybody really bother to integrate it and fab it?
From what I see fb and Google work with silicon vendors because they don't want be silicon vendors.
- andyjpb 6 years ago
  
  Google have been experimenting with POWER in their datacenters for a while now: https://www.forbes.com/sites/patrickmoorhead/2018/03/19/head...
  More historically, Google have been building their own networking gear for some time https://www.wired.com/2015/06/google-reveals-secret-gear-con...
  I'm focussing on Google in particular because they have always had a strong preference for Open components wherever possible and they've traditionally taken advantage that openness wherever they think they need to even if that goes against common practice. (There's a story I can't find the link to where, in the very early days, they wrote their own patches to Linux to work around some bad RAM chips that they'd scavenged from somewhere.)
  If Google can get an advantage then they will take it. They will also invest heavily, over years, to research these advantages and opportunities.
  Their attitude to things like ARM is still fairly accurate at the scale of their datacenters: https://research.google.com/pubs/archive/36448.pdf
jart 6 years ago

The patents have expired on i486. Does that mean x86 qualifies as a free/open ISA? Patents will expire on 64-bit soon.
danrl 6 years ago

> An open, high end CPU design is really going to change the cloud market.
I agree. It's only that POWER does not appear to be very high end to me. At best it is performing acceptable for the energy it consumes. Lowering energy consumption is what drives the margins. As a Cloud vendor I would stay as far away from POWER as possible.
- tpearson-raptor 6 years ago
  
  As a cloud services consumer, what guarantee (financial, legal, indemnification) will you grant me that your systems will not leak or otherwise tamper with my data, given that you use machines that I know for a fact you have no control over and have not audited prior to the handoff from UEFI to the hypervisor/OS? For that matter how have you mitigated the persistent x86 rogue DMA problem?
  POWER9 still has two advantages -- security and speed. Yes, speed -- the core is quite weak on some tasks and very strong on others. If you're buying this to primarily run an AVX intensive type workload, don't (unless you need the security aspects). Those massively wide, vector dependent workloads aren't exactly common in multitenant cloud though, unless you're using GPU offload where POWER again beats even the newest AMD chips for pure GPU offload performance.
  So much for the good...the ugly is that POWER9 was fundamentally late and not at performance levels we wanted, but that's a transient state. Every CPU vendor puts a chip like that out from time to time, and IBM is acutely aware of the problems here. I see no reason to go to an even more problematic architectures (x86 duopoly with master vendor keys, RISC-V with fragmentation and weak cores / immature toolchains) when we now have a better option available.
mrtweetyhack 6 years ago

Do you really think you'll see any benfit from Facebook, Amazon, or Google building their own chips? I mean they would benefit but their only goal is to get more money out of you for themselves.

bryanlarsen 6 years ago

Will this do any better than open source SPARC, which was open sourced in 1999?

https://www.eetimes.com/document.asp?doc_id=1140292

blihp 6 years ago

I can't see why it would. This would have mattered 20 years ago when there weren't more compelling ISA's out there. But that's not today's world: ARM is fairly ubiquitous and dirt cheap while RISC-V is a promising and open source up-and-comer. This seems like a relatively non-event (or worse: confirming that it's effectively a dying/dead platform) unless one has a significant investment in Power.
- Nelson69 6 years ago
  
  I tend to agree, I think if IBM were to release some core designs to go with it then they could potentially spur something interesting.
  I really see it two ways, the fact that Talos has real hardware that isn't priced up in the stratosphere (it's not cheap, but it's not insane) and then the ISA being opened. Those are giant steps for a company like IBM. At the same time, as big as those steps are for IBM, they seem like pretty small steps in terms of taking on the world with this stuff.
  Throw in something like the full G5 design? We might be talking about something different.
  
  orbifold 6 years ago
  
  They are more than willing to hand out old designs to university groups (you just have to ask nicely). In fact someone in our group spend his PhD developing a custom embedded Power processor in 65nm just to be told that when he was hired by IBM (whether he would have gotten the job is a different question of course).
- dooglius 6 years ago
  
  Why does RISC-V being "promising" and an "up-and-comer" make it more compelling than an ISA that been around for a long time?
  
  blihp 6 years ago
  
  It's more compelling because a number of research groups and companies are investing in designing and releasing hardware based on it. It has mindshare in the space that Power does not and is not likely to have.
  Just opening the ISA doesn't mean that new players can start spitting out processors based on it tomorrow or even next year. And why would they want to? Power was never in remotely the same position that x86 is/was re: binary compatibility so being able to say 'Power compatible' doesn't carry much weight. An ISA which has been a minority player but around for a long time is more likely a liability than an asset.
  
  fluffything 6 years ago
  
  For RISC-V to be a more compelling option than Power, it would need to be an option first, but if I need to buy a CPU today, I can't buy a RISC-V one.
  I can, however, buy a wide range of PowerPC CPUs, for a wide range of applications. From embedded applications, like routers, to laptops, desktops, workstations, high-end servers, up to super-computer class CPUs.
  
  gumby 6 years ago
  
  It already has a small and growing ecosystem. I agree there are no RISC-V barn burners but there are already more people with RISC-V design experience than there are for POWER.
  I think all the major non-IBM POWER folks are at Apple these days and you know which architecture they are working on!
  
  tpearson-raptor 6 years ago
  
  Having observed both sides of the industry first hand, I don't agree with that statement at all. Without going into debates on relative merit, RISC-V does its development in public while IBM until very recently has done it all behind closed tightly sealed doors. This might be giving a slightly unfair comparison on size of teams from a public perspective.
  The IBM folks really, REALLY understand how to design a secure core and chip, plus the decades learning how to make a fast and relatively efficient core. RISC-V is simply in a far more nascent state, trying to push it to POWER9 performance (let alone AMD performance) is like saying a toddler just learning to walk will win a 10k marathon tomorrow. Eventually that may happen, but not in one day, more like 20 years. ;). And when you start chasing performance, who is doing the actual hard, tedious work of verification and making sure security flaws aren't being accidentally introduced into the implementation?
  POWER is interesting to me because we get mature tooling on a proven ISA that can be built and run on high performance chips today. No more cross compiling, no more pure emulation required to do soft core work. That in and of itself is huge in the embedded space, and honestly I'd love to see the experimental and interesting cores currently decoding RISC-V ported to decode ppc64 -- all the sudden real comparisons on performance etc. for identical binaries become possible, allowing proper comparison of core design ideas under real world loading. No more guessing and having to take on pure faith that the performance difference is down to ISA or compiler performance -- either your core is faster / more efficient on the sane binary, or it's not!
  
  gumby 6 years ago
  
  Oh I don’t disagree with what you wrote except on one crucial point. Certainly agree that RISC-V is still in its baby shoes.
  But IBM’s announcement is simply the ISA itself being open sourced (with some patent IP). Apart from an FPGA soft core there’s no RTL/VHDL. If you want to make silicon you’ll be starting basically from scratch.
  
  tpearson-raptor 6 years ago
  
  Sure. While I would also like to see a real hard core or three released, here's my frame challenge:
  What if the existing RISC-V and other academic cores are already good enough for a lot of people? The instruction decoder is a relatively small part of the CPU, swap that out and you suddenly get new SoCs that can run the existing POWER software base (that means proven toolchains, vector accelerated applications, etc.). Right now RISC-V doesn't even have vector instructions per se; adding all that support to the entire tooling seems like a lot of effort for not much gain when you can simply implement VSX in the hardware and use the existing tooling for it.
  I keep hearing the open RISC-V cores are going to be very fast very soon. If that's true, how would an IBM provided core help versus an instruction decoder swap on one of those and some tuning?
- classichasclass 6 years ago
  
  Except neither of those are in the same performance ballpark as Intel, while Power ISA is.
  
  adwn 6 years ago
  
  I might be misunderstanding you, but performance isn't in the ISA, it's in the implementation. In fact, the x86 ISA is the best example for this: It's really difficult to get competitive performance out of an ISA designed in the 70s, yet billions upon billions of USD in R&D and optimization make it work.
  
  ecnahc515 6 years ago
  
  The ISA matters, otherwise we wouldn't care about SIMD. If your ISA is missing SIMD functionality, then it doesn't matter how good your implementation is, it will be slower than an implementation of an ISA that supports SIMD when it comes to anything that can leverage SIMD.
  
  chasil 6 years ago
  
  Fujitsu begs to disagree:
  https://www.theregister.co.uk/2018/08/22/fujitsu_post_k_a64f...
  Fujitsu had already built SuperSPARC-based supercomputers, and Oracle recently ported their Red Hat clone to AArch64 (no legacy 32-bit or Thumb) and produced an ISO for the PI-3.
  
  classichasclass 6 years ago
  
  I won't dispute SPARC (Fujitsu has done some impressive work on it), but the post was for ARM and RISC-V.
  
  chasil 6 years ago
  
  The link I posted was Fujitsu's new ARM server, which bundles 32GB of RAM and over 30 cores on a single die. These are deployed in dual-socket blades attached to the "tofu" routing that they scavenged from their SPARC supercomputer.
  Fujitsu is saying that their ARM implementation is the fastest server processor available, ahead of Intel.
  
  imtringued 6 years ago
  
  I don't know why you compare processor implementations and then talk as if that generalizes to all processor implementations of a given ISA. Intel Atom chips definitively aren't in the same performance ballpark as POWER9.
  I mean it should be obvious. The ISA does not dictate memory performance, micro architecture, clock frequencies, manufacturing processes, number of cores, maximum allowed power consumption, etc. All of those affect performance but are independent of the ISA.
- UncleOxidant 6 years ago
  
  Don't forget that MIPS also open sourced their ISA earlier this year as well.
  
  mindcrime 6 years ago
  
  Sort of. From what I've read, they "open sourced" their ISA in only a very weak sense of what it means to be "open source". Apparently there are a ton of restrictions on what you can do with their ISA even now.
- chx 6 years ago
  
  Yes -- the target for POWER is basically competition with AMD Rome and Intel Cascade Lake and you need to have extremely deep pockets to compete there.

PaulHoule 6 years ago

I like it.

Back in the day IBM ran a "System on Chip" factory based on PowerPC that gave us the Bluegene/L supercomputer, the GameCube/Wii/Wii U, the Playstation 3 and the Xbox 360. All of these combined one or more cores, coprocessors and tweaks to hold its own against x86.

RISC-V is meant to be used like that, but memory management support is not yet finalized. They are sampling prototype RISC-V chips with an MMU you can put in a dev box to develop Linux on. Other than that you are not using Linux or Windows.

If you think mainstream OS is bloated, then RISC-V has your number. If you want very low cost it would be exciting to cut RISC-V down to have fewer and less wide registers. The other day I saw an article about a guy who wants to build RISC-V out of vacuum tubes and thought... 'cripes with all of those wide registers it is a lot of tubes.

POWER is good-to-go right now for high end applications and can stay relevant against ARM and x86 by staying open.

Annatar 6 years ago

They are misunderstanding why RISC-V and Raspberry Pi are popular: it's not so much that they're freeware but that they are cheap. Very few people in IT know how to implement processors in hardware even with an FPGA. What makes a processor popular are cheap, affordable systems people can easily acquire in an online shop at prices which compete with or are below contemporary P. C. tin bucket hardware.

If IBM wants an uptake of POWER systems and people to develop on them and for them, the only thing which might make a dent are sub-$500 USD complete workstations and rack mountable servers. Otherwise, they will repeat the same mistake which Sun made, that is, they open sourced their UltraSPARC T1 under GNU GPL but the uptake was nil, because few had the knowledge to design systems around the processor. People want cheap, ready made toys they can tinker with immediately.

gumby 6 years ago

Not a problem with the article but when it lists the various past contenders, MIPS (with many times the lifetime installed base of, say SPARC, and still in active production) doesn't get a mention. MIPS's IS has also been recently open sourced. It's a cautionary tale.

I don't see the point of this effort for IBM. These things need communities, and POWER simply doesn't have the community; as a proprietary architecture for so long that nobody really decided to buy POWER but rather they wanted some device/ecosystem/price point and POWER was how IBM could deliver it.

The article mentions RISC-V, which still has a nascent ecosystem and no significant design wins (yet!!). But if you want to design a chip with it you can find designers with some experience with it, people developing some IP you might want to use, etc. Even that has more momentum.

olivierduval 6 years ago

Naive questions:

- Will Huawei be able to use this processor design (now that it is open sourced) to build it's own chips, bypassing ARM restriction & US IP ?

- Are these processor designs usable in mobile device, or only in workstations and servers (using to much power for example) ?

Symmetry 6 years ago

What's being released here is the instruction set architecture, not the microarchitecture for any particular processor design. As RISC ISAs go Power is relatively pragmatic, though not to the extent 32 bit ARM is, so it has relatively good code density compared to SPARC and MIPS. Plus it doesn't have annoying misfeatures like branch delay slots or register windows.
For mobile processors it seems about as good as 64 bit ARM but with a bit less software support in the mobile world, though a good history of software support in general.
- tpearson-raptor 6 years ago
  
  An interesting offset for the latter is that you could develop your software on a high end system that matches the mobile architecture. Anyone who has has to fight with (/slow/) pure emulation of e.g. Android on ARM knows the pain this causes, multiplied by the thousands of developers. That's a lot of wasted man hours vs. the develop on same architecture model.
f00zz 6 years ago

About the first question, UltraSPARC has been open source for a while. You can even download the Verilog code. We haven't seen any UltraSPARC-based processors, so I don't see why they would use this.
- baybal2 6 years ago
  
  There were quite a number of Japanese Sparks from PEZI, Fujitsu and some others, but they were all purpose made HPC products. Not mass market
  
  f00zz 6 years ago
  
  Ah, didn't know that, just checked out the Wikipedia page. Thanks.
blattimwind 6 years ago

IBM targets the scale-up market (few big, fast machines) with POWER instead of scale-out (many small, slower machines). Consequentally they are high performance but not particularly tuned for high efficiency, because performance is the more important design goal of the system.
- dragontamer 6 years ago
  
  https://en.wikipedia.org/wiki/PowerPC_e6500
  Freescale hasn't made a new low-power Power chip in a while, but... historically speaking, there were a lot of low-wattage / efficiency-focused embedded POWER designs.
  I don't know what happened politically between the companies to use ARM instead. But I would imagine that ARM's instruction set was cheaper (or maybe easier) to engineer than Power ISA. Hopefully Freescale engineers can chime in on the discussion, because I'm really just shooting from the hip here.
  I would expect most issues to come down to business politics. IBM open sourcing the PowerISA is also a business politics move (I guess they hope to recapture the lost ground in the embedded space).
  PowerISA means operating with IBM's ecosystem: GCC, Linux, etc. etc. Remember IBM has merged with RedHat, so there's a lot of promise for Linux support that ARM and RISC-V don't necessarily provide. I think this is a good move.
  
  MaxBarraclough 6 years ago
  
  > historically speaking, there were a lot of low-wattage / efficiency-focused embedded POWER designs.
  Including radiation-hardened chips appropriate for satellites (if we count PowerPC).
  https://en.wikipedia.org/wiki/RAD750
  
  gpderetta 6 years ago
  
  ARM sells, in addition to ISA licenses, complete designs. I guess that's a big advantage for those that can't afford to develop a full CPU from scratch.
  
  floatboth 6 years ago
  
  > promise for Linux support that ARM and RISC-V don't necessarily provide
  uh, ARMv8 and RISC-V were developed with Linux in mind from the beginning, they didn't even have anything other than Linux/BSD/various-RTOSes, like IBM did with AIX.
  
  dragontamer 6 years ago
  
  That's not the kind of support I'm talking about.
  Who is writing the RISC-V compiler? If the RISC-V compiler for GCC or CLang messes up, who do you call?
  If the Power9 GCC / CLang compilers mess up, you call Red Hat for support. Red Hat / IBM are now the same company, so they'll offer end-to-end services.
  -----------
  ARM has okay support: the ARM foundation seems to be taking care of their compiler kits / Linux patches / etc. etc. pretty well. But I don't think you can buy an ARM support package from anybody... really.
  I think the ARM / Linux ecosystem is still nascent. You get good support through the Rasp. Pi community, and maybe the occasional Android Phone gets a big community around it. But ARM / Linux ecosystem is quite poor outside of Rasp. Pi.
  ARM, as a company, is clearly designed as an "embedded" company. It provides the documentation and compilers, but doesn't provide too many OS-level services above that.
  
  floatboth 6 years ago
  
  > If the Power9 GCC / CLang compilers mess up, you call Red Hat for support
  uh, where and when exactly did they offer that? Actually I don't remember anyone anywhere offering commercial support for GCC or LLVM/clang.
  Well, I'm not the type of person to look for commercial support for anything ever, but I've heard of several companies that provide support for DBMSes like PostgreSQL. Not so for compilers.
  I just googled "gcc commercial support" and the results are the GCC FAQ, a mailing list post about it from 2005 (!), GCC on Wikipedia, "Office 365 GCC" (lol) and so on. Looks like it's just not a thing at all.
  
  dragontamer 6 years ago
  
  Sorry, not GCC / Clang. You're right.
  But IBM's XL Compiler: https://www-01.ibm.com/support/docview.wss?uid=swg21110831
  -------
  I think I confused it with ARM: ARM has a CLang-based compiler with official ARM support IIRC. https://developer.arm.com/tools-and-software/server-and-hpc
  I think the hobbyist (who won't get much support even if they're a paying customer) benefits from free tools / free support / communities.
  But it seems like a number of professionals prefer having a degree of professional support in the products they use.
  
  the_why_of_y 6 years ago
  
  IIRC commercial support for GCC is included with some RHEL and SLES subscriptions (probably all except the lowest-cost "desktop"). Red Hat also has a Developer Toolset product that includes a (more recent) complete native toolchain, and SUSE has a similar SLES 12 Toolchain Module / SLES 15 Development Tools Module.
  https://developers.redhat.com/products/developertoolset/over...
  https://www.suse.com/c/suse-linux-essentials-where-are-the-c...
  I haven't heard of commercial clang support though.
  
  justincormack 6 years ago
  
  Red Hat supports arm64 and recently joined the Risc-V foundation.
- floatboth 6 years ago
  
  IBM targets both markets. The two variations of POWER9 are literally called scale-up (uses buffered memory) and scale-out (regular DDR4 DIMMs). (Now there's a third one for huge I/O needs…)
  The scale-out POWER9 scales down to 4-core.
somepig 6 years ago

POWER is about as far from a good fit for most ARM applications as you can possibly get.
It's all about shoving a ton of hot power hungry multithread cores as close together as you can and running them at full bore.
- monocasa 6 years ago
  
  The Freescale/NXP 4xx/75x PowerPC cores are fairly common embedded CPUs. These days POWER and PowerPC are the same ISA.
  
  somepig 6 years ago
  
  No. POWER and PPC are decidedly not the same. the closest they ever came together was the G5's 970.
  4xx and 75x were OK for embedded a decade ago, but today they're hot and power hungry. You can use them in devices where you can burn 10+ watts to maintain backwards compact with existing PPC code, but they're way the fuck too hot for a phone.
  
  classichasclass 6 years ago
  
  That's true for 32-bit, but 64-bit PowerPC is pretty much synonymous with Power ISA.
  
  shawnz 6 years ago
  
  But is that due to ISA differences or just microarchitectural differences?
  
  temac 6 years ago
  
  The ISA is mostly the same. I mean, look at https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/as...
  There are differences in details about uncommon instructions, irrelevant assembly language changes, some instructions privileged for one arch and not the other, that kind of things.
  But for the bulk of the ISA, it's the same. You probably can create a single userspace binary compatible with both? Not sure but seems doable.
  The microarch is likely different but then it is also different between several members of each category, so the word does not really designates the micro-arch, but really the ISA. And then you have other brand names using that, and they are so similar that e.g. Freescale switched from PowerPC to Power while incrementing PowerQUICC II to III. I remember Linux has an eieio macro that just emits the aforesaid instruction for PPC, and actually the opcode does something similar on Power (mbar) and IIRC the assembler is happy to emit it regardless of the ISA.
  So it was kind of messy when you reached the differences, but everything was quickly workable and you got use to it. The reference manuals of Freescale are very good and the "[...]Programmer’s Reference Manual for Freescale Power Architecture Processors" EREF_RM often directly points at the few differences with PowerPC.
  
  monocasa 6 years ago
  
  On the same process node, PPC chips aren't anymore hot than ARM.
- amock 6 years ago
  
  POWER already has a place next to arm https://en.wikipedia.org/wiki/QorIQ.
  
  somepig 6 years ago
  
  POWER and Power (formerly PowerPC) are similar but quite different. PPC has been in embedded (but generally not mobile) for quite a long time, but even then, the cores are still hot, power hungry, and poorly suited for mobile.
  
  StillBored 6 years ago
  
  Because the designs predate the big push into extremely high efficiency processors. Like the big "server class" processors, the investment required to create a truly high efficiency processor is quite large. Small in-order cores with limited functional units, and lacking much of what makes a modern processor fast (vector units, specialty instructions/etc) can fool people into thinking that minimal clock domains/gating is sufficient to create a high efficiency design.
pkaye 6 years ago

> Will Huawei be able to use this processor design (now that it is open sourced) to build it's own chips, bypassing ARM restriction & US IP ?
Which operating system would the use? Is that supported with the power instruction set?
- cmrdporcupine 6 years ago
  
  Power is totally mainstream and well established. You have your choice of a bunch of operating systems and compilers. Linux, llvm/clang, GCC, etc. all there for a really long time.
  Windows also ran on the PowerPC architecture at various points.
  In 1996 I worked briefly a company that had a bunch of "PERP" ("PowerPC Reference Platform") machines lying around given to them by IBM that for application porting to Windows NT PowerPC. For kicks, I put Linux on them, so they'd actually be useful for something.
  PowerPC is not strictly identical to Power architecture, but is related and most tools and OSes can be made to work either.
xvilka 6 years ago

RISC-V is a better fit for mobile devices. But Huawei does produce network hardware that might benefit from a good POWER-based platform.
- Symmetry 6 years ago
  
  What aspects of RISC-V do you think make it a better fit?
  
  als0 6 years ago
  
  I'd also like to know, but I imagine that the compressed version of the RISC-V ISA would be a significant factor as code density is important for a phone.
  
  Iwan-Zotow 6 years ago
  
  really?
  for last three-four years phones have ca 2-3-4Gb of RAM and 32-64Gb of storage
  nad if are talking about RISC-V phone, it would be produced in no early than 2020, so not sure code density is a significant factor
  I don't think
  
  monocasa 6 years ago
  
  Code density matters for I$ thrashing reasons, but it's probably not a significant enough factor to overtake the market supremacy of ARM.

the_trapper 6 years ago

Interesting move by IBM.