NVIDIA continues to vertically integrate their datacenter offerings. They bought mellanox to get infiniband. They tried to buy ARM - that didn't work. But they're building & bundling CPUs anyway. I guess when you're so far ahead on the compute side, it's all the peripherals that hold you back, so they're putting together a complete solution.
Nvidia's been making their own CPUs for a long time now. IIRC the first tegra was used in the Zune HD back in 2009. Hell they've even tried their hand at their own cpu core designs too.
This was also kind of the case with the PS3. Its sales weren't fantastic at release, partially because of its... $600 (?) price tag. But even at that price, at its release, it was one of the cheapest ways to get a Blu-ray player, and many people bought it for that.
Not just a Blu-ray player, but one that is guaranteed to be able to play practically all blu-ray discs as long as Blu ray discs are made or the console hardware fails.
Sony pushed updates to the firmware. Most commodity Blu ray players don't have an (easy) way to update.
But for both the PS2 and PS3, Getting folks to adopt the new formats was definitely a factor.
In the case of the PS2, I think less so; It wasn't the cheapest way to get a DVD player, but IIRC it wasn't that much more than a DVD Player with Component out at the time (note; All PS2s can do Component out, but only later models can play DVDs at 480p) and that made it a lot easier for families to buy-in.
(clicks link) time to get /sad/ about being a SEGA fan again.
More seriously I wish some of the old consoles were officially opened because the absolute install base of PS1 and NES compatible hardware must be insane. Indie NES games specifically have become popular lately, but I don't think any of the 3D capable consoles are popular or open targets.
Tegra X2 and Xavier are still sold today and contain NVIDIA-designed CPU cores. The team behind those is building new designs too, I wonder when they’re going to announce something.
This leads me to wonder about the microprocessor shortage.
So many computing devices such as Nvidia Jetson and Raspberry Pi are simply not available anywhere. I wonder what's he point of bringing out new products when existing products can't be purchased? Won't the new products also simply not be available?
The products don't get produced in order. The high value products get priority and continuously bump out low value chips like those on the RPI. Not sure what the cost of this Grace chip is but it looks to be targeting high value users so it gets priority. Notice how there is no shortage of chips for iPhones, because Apple just buys the capacity at whatever cost it takes.
Not much of a shortage. I just checked and they are all available for pickup right now at my local small city store. Compared to other products they are still extremely available.
Specifying the M1 max = you are ordering a custom machine = there is a delay because a factory has to build it for you. The machines that are available for immediate pickup are the base level specs as shown on apple.com.
I bought an M1 Macbook Air mid-February. The site gave me a ship date of four weeks later, but actually shipped from China in eight days, and arrived at my office a couple days later.
Given what a mess shipping has been for the last two years, they appear to be taking the "underpromise, overdeliver" route on shipping quotes.
That's pretty common I think. The retailers get their orders in and planned well before consumers can.
In some cases you're finding far flung regions relative to the source having better availability, because they were allocated a percentage of the original supply but are also too far away for scalpers to be interested, so they just haven't gone through it as fast. Australia hasn't had too many troubles with some items that are pretty hard to get elsewhere for example. Getting a 3080 in NYC is probably a real challenge, but I can walk to my local parts store and pick one up no dramas.
I was pretty surprised by the low prices of m1 macbooks when even the lowest end models perform so much better than the high end of previous models. I'm sure Apple is spending less money on manufacturing them now that they're not going through Intel, but I would have expected them to just keep charging the same and eaten the profit margin themselves.
They are trying to establish the new architecture. Also you still need to shell out $2-3k to get something decent and practically start at $1.5k. I wouldn't call that cheap or even cheaper. What is the past difference you see?
Starting price for the air is $999, which gets you a very fast computer (albeit one a bit anemic in memory). A couple of years ago, the starting price for the air was still $999, but you got a... much less fast computer.
Current MacBook Pro 14"/16" use the M1 Pro/M1 Max instead of the M1 that the Air and 13" MacBook Pro have, so definitely a different (and later iteration) CPU: https://www.apple.com/macbook-pro/
No, that's incorrect. Here's an image that illustrates the physical differences between the M1, M1 Pro, and M1 Max, including their vastly different die sizes:
https://images.anandtech.com/doci/17019/Die-Sizes.jpg
Let's call it an oversimplification, the SoCs are definitely different. OP sounded like they were thinking about MacBook Air and the 13" MacBook Pro, and not taking the newer models into account.
> you still need to shell out $2-3k to get something decent
Honestly 16GB Air is pretty epic for $1200, though you probably want to spend the extra $200 for a storage bump as well. I'm very happy with the performance for dev tasks, and with my (displaylink) dock it runs multiple screens just fine too.
Picked up a refurb 16GB air just last week. I'm astounded at the battery life. It's my dream laptop. I wanted to buy a framework for the repairability, but I make some iOS apps and also really wanted the battery life. I've been floored by how long it lasts, even running npm installs and compiling Angular applications, things that used to burn my lap and drain the battery in 4-6 hours on my Intel air.
The macbooks are starting to become somewhat more repairable. The latest one has pull strip adhesive for the batteries which makes user replacement massively easier.
> Notice how there is no shortage of chips for iPhones, because Apple just buys the capacity at whatever cost it takes.
Apple bought out the entire capacity of TSMC's 3nm node [1]. I would not be surprised if the deal actually was for Apple to fund the construction of the fab in exchange for this level of priority.
Probably not consumer grade GPUs. NVIDIAs enterprise GPUs are basically the same silicone sold at 10x the price, and bought in batches of 10s to thousands at a time.
Direct retail / individual sales are always the least important and the first to get restricted amounts of supply so that large orders can be filled. There is a queue and lots of orders are moving through it, you personally just don't see this.
Depending on the product, volume orders for high-end ICs are typically running between 52 and 72 weeks of lead time at the present, and it's been this way for many months now. So the orders that are getting filled today for parts were placed in early 2021 in most cases.
This is generally very difficult for retailers, because they have had to come up with capital to have a year's worth of orders in the pipeline. So they've been having to stock fewer things -- only what they are absolutely sure will sell -- and can't use real-time sales data to estimate the next month's order.
Welcome to the new normal, it'll be this way for at least another year or two, minimum (until new factories get built plus pre-pandemic levels of productivity, for the most part).
i guess you dont' know the world of shoes and hypebeast. The primary "value" of the industry is scarcity alone. The sneaker market side of it is worth $100bil alone.
This is interesting. So without actually targeting a specific Cloud / server market for their CPU, which often ends with a chicken and egg problem with HyperScaler making their own Design or Chip. Nvidia manage to enter the Server CPU market leveraging their GPU and AI workload.
All of a sudden there is real choice of ARM CPU on Server. ( What will happen to Ampere ? ) The LPDDR5X used here will also be the first to come with ECC. And they can cross sell those with Nvidia's ConnectX-7 SmartNICs.
Hopefully it will be price competitive.
Edit: Rather than downvoting may be explain why or what you disagree with ?
I wonder if Apple also intends to introduce ECC LPDDR5 on the Mac Pro. Other than additional expansion, I’m struggling to see what else they can add to distinguish it from the Mac Studio.
In order for an Apple Silicon Mac Pro to make any sense whatsoever, its SOC will need have to have support for off-package memory and substantially more PCI-E lanes than the M1 Ultra. Therefore it seems all but certain to me that it will debut the M2 chip family.
Apple isn't going to give up the substantial performance benefits of on-package unified memory in order to support DIMMs. Therefore I predict that we'll see a two-tier memory architecture with the OS making automated decisions based on memory pressure, as well as new APIs to allocate memory with a preference for capacity or performance.
The chassis design is new enough that it was designed with an eventual Apple Silicon Mac Pro in mind, so I expect to see minimal change to the exterior. It might shrink and have fewer slots (particularly since most users won't need a slotted GPU) though I think that's unlikely given that its height and width was defined by 5U rack dimensions.
Currently the performance and power benefits of having tightly packaged RAM are taken full advantage of by the M1 family. A less tightly coupled memory system will likely have significant performance implications. There's a reason why all GDDR memory for GPUs is soldered, as there's signaling issues caused by things like longer traces and the electrical behavior of the sockets themselves.
People also seem often to forget that interconnects are a significant amount of modern power budgets - look at the Epic IO die often using more than the cores in many workloads. It may be the the M1 family looks less attractive when you actually have to add these requirements.
Perhaps there's some possibility of having both a tightly-coupled RAM package and also have an extensible memory system - though that has significant management complexity if you try to treat it like a cache, or likely needs app support if it's some NUMA system where they're mapped separately. But possible, at "just" the cost of the extra memory controller.
I would love to be wrong but I dont think DIMM will be a thing on Mac any more. Not only does it not make economical sense for Apple with additional DDR5 memory controller support and testing, they can now also charge substantial premium for memory.
DIMMs likely won't be a thing anywhere before too long. They're too large and problematic to deal with for the latest memory interfaces (which previously had only been found on GPUs). There's only so much you can miniaturize before the connectors become a real problem - and we can now put an entire "computer" in a single package.
I'm interested to see when the PC form factor goes away completely, likely 1-2 generations of product from now.
I think support for GPU compute has reasonable odds. I'd place worse odds on GPU support for back-end graphics rendering or display driving. And of course support for any Nvidia card would continue to have very poor odds.
Heck, we're talking about a company that put an A13 into a monitor. I wouldn't put it past Apple to put an M2 Ultra onto MPX modules and have that GPU/ANE compute performance automatically available through existing APIs. (Would be a great way to bin M2 Ultra chips with a failed CPU core.)
I don't think you'll have to imagine. It says on the box it's designed for HPC. and every supercomputer in the Top 500 has been a Beowulf cluster for years now.
Discrete GPUs have historically been a relatively small and volatile niche compared to CPUs, it's only in the last few years that the market has seen extreme growth.
edit: the market pretty much went from gaming as the primary pillar to gaming + HPC, which makes it far more attractive since you'd expect it to be much less cyclical and less price sensitive. Raja Koduri was hired in late 2017 to work on GPU related stuff, and it seems like the first major products from that effort will be coming out this year. That said, they've obviously had a lot of failures in the acelerator and graphics area (consider Altera) and Koduri has stated on Twitter that Gelsinger is the first CEO to actually treat graphics/HPC as a priority.
CUDA came out in 2007. Wikipedia puts the start of the GPU-driven 'deep learning revolution' in 2012 [1] and people have been putting GPUs into their supercomputers since 2012 as well [2]
I find it strange that Intel has basically just left the entire market to nvidia, despite having 10-15 years warning and running their own GPU division the whole time.
Competing with Nvidia on Gaming GPU wasn't something Intel were keen to do after their failure with i740. The Gaming market wasn't as big, and you are ultimately competing on Driver optimisation, not on actual hardware.
CUDA and Deep Learning may have started in 2007 and 2010. But their usage, or their revenue potential was unclear back then. Even in 2015, Datacenter revenue was less than one eighth of gaming revenue. And rumours of Google AI Processor ( now known as TPU ) started back in 2014 when they started hiring. In 2021, Datacenter is roughly equal to Gaming revenue, and are expected to exceed them in 2022.
Intel sort of knew GPGPU could be a threat by 2016 / 17 already. That is why they started assembling a team, and hired Raja Koduri in late 2017. But as with everything Intel in post Pat Gelsinger era, Intel was late to react. From Smartphone to Foundry Model and now GPGPU.
They created the Xeon Phi[1] for that niche. It was spun out of Larabee[2]. I presume they will be taking advantage of their coming GPU architecture for more going forward.
They tried to check many, some, maybe possibly more of the boxes with the Xeon Phi, and it kinda seems like things simply didn't go their way.
Cuda wasn't as flexible, and the payoff wasn't as big in 2010 or so as it is now.
I've never used a phi, but i can see where they were coming from i think. No need for a full rewrite like Cuda (maybe). The hardware is also more flexible than a GPU, but that turned out to be less important than they thought it might be.
this isn't true. the phi was extremely complex to program for, and it was not simply a port of standard x86 code. it required you to pay attention to multiple levels of memory hierarchy, just as the GPU did.
Intel produced good, as in "cheap and always working", integrated GPUs. For great many tasks, they are adequate. I'm not a gamer, and if I needed to run some ML stuff, my laptop's potential discrete GPU won't be much help anyway.
Also, Intel has a history of producing or commissioning open-source drivers for its GPU. I like the peace of mind I get from knowing I'm not going to have to fight dirty for the privilege of getting my own GPU to do the work I bought it to perform.
While I view my Intel iGPU as a backup, I don't have any negative impressions about its performance like many gamers do. I have the 11900K which has an iGPU capable of 720P gaming. Which is quite remarkable to be honest considering it's integrated into my CPU. Cheap and "just works" is exactly how I view it, but they're getting better in the last 2 generations.
I can't find a new dGPU at MSRP so I'm going to see if the Intel Arc cards are more readily available, and if not, I'm probably going to part out my desktop and move permanently to using Intel NUCs. Mostly for the GPU contained within. It seems like the days of getting your hands on a dGPU are over, and I'm not fighting over them.
GPU shortages are nearing an end and with next generation products from Nvidia, AMD and Intel on deck, well probably be in a really good spot for GPU consumers come q4 2022.
It’s been so long now, 2.5 years that I now view GPUs like I do gas prices. You can’t trust in a stable market. It’s not like GPUs didn’t skyrocket in price in the years leading up to the shortage anyway.
Best long term lifetime decision is to get off any dependency for either of them. I’m looking at electric cars and Intel NUCs. A lot of people that I know moved to laptops for the same reason. A lot of us gave up and many like me no longer trust the market.
Besides integrated GPUs for actual graphics usage that other comments mentioned, Intel did make some attempts at the GPGPU market. They had a design for a GPU aimed primarily at GPGPU workloads, Larrabee, that was never released [1], and adapted some of the ideas into Xeon Phi, a more CPU-like chip that was intended to be a competitor to GPUs, which was released but didn't gain a lot of market share [2].
The space has two competitors, but NVidia makes most of the GPUs and most of the money. If there's barely room for a second player, there's no room for a third. That being said, they are releasing a GPU soon so we'll see how that goes. Unless the market continues to be insane I'm going to guess it won't go over very well.
I had a laptop with NVIDIA GPU that crashed Xorg and had to be rebooted whenever Firefox opened WebGL. Just to complement the positive sibling comments :-)
Linus might know his way around UNIX clones and SCM systems, however he doesn't do graphics.
NVidia tooling is the best among all GPU vendors.
CUDA has been polyglot since version 3.0, you get proper IDE and GPGPU debugging tools, and a plethora of libraries for most uses cases one could think of using a GPGPU for.
OpenCL did not fail only because of NVidia not caring, Intel and AMD have hardly done anything with it that could compete on the same tooling level.
I like CUDA, that stuff works and is rewarding to use. The only problem is the tons and tons of hoops one must jump to use it in servers. Because a server with a GPU is so expensive, you can't just rent one and have it running 24x7 if you don't have work for it to do, so you need a serverless or auto-scaling deployment. That increases your development workload. Then there is the matter of renting a server with GPU; that's still a bit of a specialty offering. Until the other day, even major cloud providers (i.e. AWS and Google) offered GPUs only in certain datacenters.
I had an Ubuntu 18.04 install that "randomly" started dying (freezing) with my GTX1080 at some point. Pinpointed it to the combination of that GPU + Linux. I didn't want to bother with reconfiguring my WC loop / buying an expensive GPU, so I just gave up and switched to a perfectly stable Windows + WSL.
over the past two decades that I've used nvidia products for opengl and other related things, my experince has been largely positive although I find installing both the dev packages and the runtimes I need to be cumbersome.
soooo... would something like this be a viable option for a non-mac desktop similar to the 'mac studio' ? def seems targeted at the cloud vendors and large labs... but it'd be great to have a box like that which could run linux.
It probably won't run Windows. But other operating systems, probably yes. Maybe Microsoft comes up with some sort of Windows Server DC Arm edition in the future so they can join in as well.
This is really not that much considering how much every stock has gone up the last couple of years. Nvidia and AMD is up 887% and 737% respectively from 5 years ago.
Given how larger non-mobile chips are jumping to the LPDDR standard what is the point of having a separate DDR standard? Is there something about LPDDR5 that makes upgradable dimms not possible?
I really don't see what they can do. It seems like in the last year they pivoted hard into "ok we'll build chips in the US again!", but it's going to be years and years before any of that pays off or even materializes. The only announcements I've heard from them are just regular "Here's the CEO of Intel telling us how he's going to fix Intel" PR blurbs and nothing else. Best case maybe they just position themselves to be bought by Nvidia...
No. Intel worked out it needs to open its production capacity to other vendors. They will end up another ARM fab with a legacy x86-64 business strapped on the side. That's probably not a bad place to be really. I think x86-64 will fizzle out in about a decade.
I don't feel like ARM has serious technical advantages over x86-64 as an ISA, although it is cleaner and has more security features which is good. Isn't the main advantage just that it's easier to license ARM?
Once enough patents expire all ISAs are eventually equal, I'd think.
I feel much the same way. I've used both pretty extensively at this point, and I'm not sure if I'm a believer in either mentality. I'm hoping that RISC-V will be the one to blow my mind, though.
Yes and no. Look at some of the loop optimisations possible on ARM compared to x86-64. I've had x86-64 run 8 instructions that ARM does in 1 instruction.
I remember PPC and its rlwinms and co. My ARM isn’t that good, though I can read it.
But some of those x86 instructions take 0.5 cycles and some of them take 0 if they’re removed by fusion or register renaming. It has worse problems, like loop instructions you can’t actually use but take up the shortest codes.
Of course they have a chance to catch up. Only a fool would count Intel down & out. Intel is still larger by revenue than AMD, NVidia, and ARM combined.
This will probably cost them some market share, but they have plenty of cash to weather there current manufacturing issues, they still have world-class CPU design talent which they've proven over and over and over again, and they have some very interesting products & technologies on the roadmap.
ARM offering a fight for the first time ever is not going to be a 1-hit KO against the goliath that is Intel.
Intel will never catch up because Arm's business model is much better. Intel is not competing with Arm, they're competing with every large tech company, who are all sharing many design costs via Arm and mostly sharing manufacuring costs via TSMC.
Arm has a much more efficient and also much less profitable business model, and Intel will never catch up unless they adopt it. They'll never do that so they'll fade away like IBM.
Err sorry, I thought we were talking about the Apple M1 as in another comment subthread here, but that wasn't this one actually.
But my point still stands I think, isn't this CPU designed by Nvidia, also just with an ARM-licensed ISA? Similarly AMD you mentioned in your list shares its ISA with Intel, and yet the CPUs are completely different.
There are some hints that they are redesigning some server processors to double core count but that may not be visible for 2-3 years. Also keep in mind that Intel has 75% server market share and is only losing ~5 points per year.
There is a very good chance that Intel will catch up. They have money, they have capacity, and from what I understand they still have several more designs researched and those will enter production over the next few years. They are also working on RISC-V stuff (AMD is too).
Anyone have a sense for how much these will cost? Is this more akin to the Mac Studio that costs 4k or an A100 gpu that costs upward of 30k? Looking for an order of magnitude.
The top-end datacenter GPUs have been slowly creeping up from $5k a few generations back to about $15k for the A100's now. So this one will probably continue the trend, probably to $20k or maybe $30k but probably not beyond that.
That would be a real shame. I really want someone to make a high core count ARM processor in the price range of an AMD threadripper that can work with Nvidia gpus.
Amazon has at least two generations of their own homebrew ARM chip, the Graviton. They offer it for people to rent and use in AWS, and publicly stated they are rapidly transitioning their internal services to use it too. In my experience Graviton 2 is much cheaper than x86 for typical web workloads--I've seen costs cut by 20-40% with it.
AWS has their own CPU. Microsoft is an investor in Ampere, but I am not sure if they will make one themselves or simply buy from Ampere. Google has responded with faster x86 instances, still no hint of their own ARM CPU. But judging from the past I dont think they are going to go with Nvidia.
That is only the CPU though, they might deploy it as Grace + Hopper config.
AWS+Azure (and I believe GCP) installed prev advances, and are having huge GPU shortages in general... so probably!
An interesting angle here is these support partitioning even better than in the A100's. AFAICT, the cloud vendors are not yet providing partitioned access, so everyone just exhausts worldwide g4dn capacity for smaller jobs / devs / etc. But partitioning can solve that...
The CPU complex on the M1 series doesn't have anything close to the full bandwidth to memory that the SoC has (like, half). The only thing that can drive the full bandwidth is the GPU.
Who bets that the amount of detailed information they'll officially[1] release about it is "none" or close to that? I still think of Torvalds' classic video whenever I hear about nVidia. The last thing the world needs is more proprietary crap that's probably destined to become un-reusable e-waste in less than a decade.
I think we're all missing the forest because all the cores are in the way:
The contention on that memory means that only segregated non-cooporative as in not "joint parallel on the same memory atomic" will scale on this hardware better than on a 4-core vanilla Xeon from 2018 per watt.
So you might aswell buy 20 Jetson Nanos and connect them over the network.
Let that sink in... NOTHING is improving at all... there is ZERO point to any hardware that CAN be released for eternity at this point.
Time to learn JavaSE and roll up those sleves... electricity prices are never coming down (in real terms) no matter how high the interest rate.
As for GPUs, I'm calling it now: nothing will dethrone the 1030 in Gflops/W in general and below 30W in particular; DDR4 or DDR5, doesn't matter.
Memory is the latency bottleneck since DDR3.
Please respect the comment on downvote principle. Otherwise you don't really exist; in a quantum physical way anyway.
Nope, 1030 has 37 Gflops/W... G13 786/20W = 40... and that's 14nm vs 5nm... still I'm pretty sure there are things the 1030 can do that the A13 will struggle with.
G13 (in the 8-core/1024 ALU config as in M1) delivers 2.6TFLOPS with sustained power consumption of 10W. That's almost an order of magnitude better than 1030. Sure, node definitely matters, but going from 14nm to 5nm cannot explain the massive power efficiency difference alone.
What are the things that 1030 can do that G13 will struggle with?
There is not a single page on the whole internet that states Gflops and Watt on the same page, I did 2 googlings: "apple g13 gflops" and "apple g13 watt"... the results where completely disturbing seen this info should be clearly available. Like when you google 1030 gflops and watt you get all links on google linking to pages stating both figures and they are the same...
M1 comes is MANY flavours with different watt and gflops.
And for CPU Glops I had to get friends to measure it themself: 2.5Glops/W compared to Raspberry 4 2Gflops/W and this time it's 5nm vs 28nm.
Please give me official Gflops and Watt sources and we can discuss.
The page you link is NOT clearly stating watts in a clear way.
> What are the things that 1030 can do that G13 will struggle with?
I real life when you develop games for real hardware you notice their real limitations like fill rates and how they scale different behaviours because they have enough registers to do things in one blow or they have to remember things. It's complicated, but eventually you realize you can just benchmark things for your own needs and for me 1030 is for all purposes as good as 1050 so far: 2000 non-instanced animated characters on 1030 at 30W vs like 2500 for 80W 1050!
Without knowing, I'm pretty sure the M1 cannot do more than 1000 at whatever watt it uses... not that I would ever compile anything for a machine where I need to sign the executable.
“Official sources” in this case is testing done by me personally. I am the author of the post on RWT linked previously. I would be happy to share my benchmarking code for your scrutiny if you want. The M1 variant tested was base M1 in a 13” MacBook Pro.
I don’t know what your friends have tested but the results make zero sense. Firestorm reaches 1700 points in GB5 at 5W. Pi 4 is under 300 at similar wattage.
First answer on google: "maximal power consumption is around 50 watts"
Firestorm is GPU (again google has little info) I'm talking CPU for the Raspberry.
The Raspberry 4 GPU uses 1W. You are conflating because of sunk cost.
You need to compare the same things, apples to apples (no pun) one CPU core on Raspberry consumes 1W on the M1 they are 4W
GPU is 1W vs 5W (if you are correct which I HIGHLY doubt, I suspect 20W for the GPU alone, wikipedia states 39 watts at maximum load so yes 20W for the GPU)!
You need to start looking at the world objectively and understand how it really works, because peak energy is not going to be forgiving if you don't.
By using the provided system tools that report power usage of the GPU cluster? Also, I am telling you the system diagnostics output if an actual physical machine. What are you quoting Wikipedia for? You can literally go measure these things. Should I go edit Wikipedia so that you get correct information?
Anyway, power usage of M1 variants has been studied in detail. It’s 5 watts peak for a single performance core, 20W peak for a CPU cluster of four cores, 10W for a 8-core GPU (128 FP32 ALUs per core). Bigger M1 variants have respectively higher power consumption because of the larger interconnects/caches etc. DRAM is also a factor. Running at full bandwidth is can consume over 10W of power.
Well, it seems like you made up your mind without doing any testing or educating yourself, so I am not quite sure what I can do to help you. Already your entirely nonsensical comment of “needing to sign the executable” speaks volumes. Why did I never need to sign anything despite building software on M1 machines daily for the last year? I wonder…
All code that runs on Apple silicon must be signed. If you don't explicitly sign your executable, the linker will inject an ad-hoc signature into your binary for you.
Sure. But it does not affect you as developer in any form or fashion. It's just a thing the linker does. You can still distribute your binaries, disassemble them etc. as you ever would.
I wouldn’t say it’s all over. People have been saying that it’s all over for longer than I can remember, and there is always someone smarter and more clever. The GPU space is ripe for disruption, the memory space is ripe for disruption, and the CPU space is being disrupted presently. For all I know, some genius has it worked out now and is going to launch a new startup sometime this month.
Aren't you are ignoring use cases where all cores read shared data, but rarely contentiously write to it. You should get much more read bandwidth and latency than over a network.
Sure, but my point is: why cram more and more cores into the same SoC if they can't talk to each other more efficiently than separate computers over ethernet?
This point feels like arguing why any organization would seek density in computing if they can just buy more of something and spread it out. I don't know about you but I've saved a ton of effort on design complexity by not distributing workloads when it can be avoided (but distributed computing is a solved problem).
I recognize what you are calling out/that performance will be the same on some workloads if you distribute or not. I would just point out less manufacturing causes less e-waste/I would rather live in a world where Nvidia sells 50 million 10*0 cards, than 500 million 1030 cards to create the same amount of compute in the world. It's not just the power costs to consider (but it could be there is a reality where running 500 million 1030s for their lifetime wastes so much less power, that the manufacturing costs to the planet are worth it).
Your point is rooted in wrong facts. On-chip fabrics are much more efficient than separate computers over ethernet. More energy efficient and lower latency.
Not only that. On-chip gives you high precision synchronous time (all cores observe the same time) so you can use synchronous distributed algorithms that are unsuitable for ethernet networks.
This type of hardware allows for much better solutions to some problems.
Performance per watt isn’t so useful for a GPU. People training ML algorithms would gladly increase power consumption if they could train larger models or train models faster.
And that's exactly my point: they can't. Power does not solve contention and latency! It's over, permanently... (or atleast until some photon/quantum alternative, which honestly we don't have the energy to imagine, let alone manufacture, anymore)
After 13 microarchitectures given the last names of historical figures, it's really weird to use someone's first name. Interesting that Anandtech and Wikipedia are both calling it Hopper. What on Earth are the marketing bros thinking?
The GPU is Hopper, which is in line with their naming scheme up till now. The CPU is call Grace. Clearly they are planning to continue the tradition of naming their architectures after famous scientists and the CPUs will take on the first name while the GPU will continue to use last.
So expect a future Einstein GPU to come with a matching Albert CPU.
NVIDIA continues to vertically integrate their datacenter offerings. They bought mellanox to get infiniband. They tried to buy ARM - that didn't work. But they're building & bundling CPUs anyway. I guess when you're so far ahead on the compute side, it's all the peripherals that hold you back, so they're putting together a complete solution.
Nvidia's been making their own CPUs for a long time now. IIRC the first tegra was used in the Zune HD back in 2009. Hell they've even tried their hand at their own cpu core designs too.
https://www.anandtech.com/show/7621/nvidia-reveals-first-det...
https://www.anandtech.com/show/7622/nvidia-tegra-k1/2
Maybe even more importantly: Tegra powers the Nintendo Switch.
Note the CPU cores in that design aren't designed by NVidia.
Which is (EDIT: NOT) the most widely sold console ever.
Not by a long shot.
PS2 and DS outsell by about 50 million units.
"PS2? That can't possibly be right..."
https://www.vgchartz.com/analysis/platform_totals/
Holay molay.
It was the most affordable DVD player. I think Sony owned patents on some DVD player tech? Same with PS4/5 and Blu Ray if I'm remembering correctly
This was also kind of the case with the PS3. Its sales weren't fantastic at release, partially because of its... $600 (?) price tag. But even at that price, at its release, it was one of the cheapest ways to get a Blu-ray player, and many people bought it for that.
Not just a Blu-ray player, but one that is guaranteed to be able to play practically all blu-ray discs as long as Blu ray discs are made or the console hardware fails.
Sony pushed updates to the firmware. Most commodity Blu ray players don't have an (easy) way to update.
"Five Hundred Ninety Nine US Dollars!"
But for both the PS2 and PS3, Getting folks to adopt the new formats was definitely a factor.
In the case of the PS2, I think less so; It wasn't the cheapest way to get a DVD player, but IIRC it wasn't that much more than a DVD Player with Component out at the time (note; All PS2s can do Component out, but only later models can play DVDs at 480p) and that made it a lot easier for families to buy-in.
A lot of early Blu-ray players had terrible load times. Long enough to be pretty annoying. The PS3 had the CPU horsepower to play discs quickly.
If memory serves, there was less than 1 game per PS3 sold at launch.
I think it has more to do with the fact they managed to reduce it's price down to $99. They haven't been able to do that with subsequent consoles.
(clicks link) time to get /sad/ about being a SEGA fan again.
More seriously I wish some of the old consoles were officially opened because the absolute install base of PS1 and NES compatible hardware must be insane. Indie NES games specifically have become popular lately, but I don't think any of the 3D capable consoles are popular or open targets.
There were eight new Dreamcast games just last year.
https://en.wikipedia.org/wiki/List_of_Dreamcast_homebrew_gam...
Indeed, wow.
Tegra X2 and Xavier are still sold today and contain NVIDIA-designed CPU cores. The team behind those is building new designs too, I wonder when they’re going to announce something.
Orin
Orin uses the Cortex-A78AE core for the CPU complex instead of NVIDIA-designed cores.
Ah, you meant like that. I assumed if they're bringing a new module architecture.
This leads me to wonder about the microprocessor shortage.
So many computing devices such as Nvidia Jetson and Raspberry Pi are simply not available anywhere. I wonder what's he point of bringing out new products when existing products can't be purchased? Won't the new products also simply not be available?
The products don't get produced in order. The high value products get priority and continuously bump out low value chips like those on the RPI. Not sure what the cost of this Grace chip is but it looks to be targeting high value users so it gets priority. Notice how there is no shortage of chips for iPhones, because Apple just buys the capacity at whatever cost it takes.
Though, there is a shortage of m1 MacBooks. Is it really because they are low value (margin?) products relative to iPhone? I'm not sure.
Not much of a shortage. I just checked and they are all available for pickup right now at my local small city store. Compared to other products they are still extremely available.
Interesting, I see nothing available from store.apple.com until 6 April earliest, and 29 April for m1 max and even later depending on options.
Specifying the M1 max = you are ordering a custom machine = there is a delay because a factory has to build it for you. The machines that are available for immediate pickup are the base level specs as shown on apple.com.
I bought an M1 Macbook Air mid-February. The site gave me a ship date of four weeks later, but actually shipped from China in eight days, and arrived at my office a couple days later.
Given what a mess shipping has been for the last two years, they appear to be taking the "underpromise, overdeliver" route on shipping quotes.
A month is not a long time these days..
That's pretty common I think. The retailers get their orders in and planned well before consumers can.
In some cases you're finding far flung regions relative to the source having better availability, because they were allocated a percentage of the original supply but are also too far away for scalpers to be interested, so they just haven't gone through it as fast. Australia hasn't had too many troubles with some items that are pretty hard to get elsewhere for example. Getting a 3080 in NYC is probably a real challenge, but I can walk to my local parts store and pick one up no dramas.
I was pretty surprised by the low prices of m1 macbooks when even the lowest end models perform so much better than the high end of previous models. I'm sure Apple is spending less money on manufacturing them now that they're not going through Intel, but I would have expected them to just keep charging the same and eaten the profit margin themselves.
They are trying to establish the new architecture. Also you still need to shell out $2-3k to get something decent and practically start at $1.5k. I wouldn't call that cheap or even cheaper. What is the past difference you see?
Starting price for the air is $999, which gets you a very fast computer (albeit one a bit anemic in memory). A couple of years ago, the starting price for the air was still $999, but you got a... much less fast computer.
> still need to shell out $2-3k to get something decent and practically start at $1.5k
They're all using the exact same CPU, in fact you can make the air perform (almost) just as well as the pro/mini by opening it up and adding a thermal pad: https://www.cultofmac.com/759693/thermal-mod-m1-macbook-air/
Current MacBook Pro 14"/16" use the M1 Pro/M1 Max instead of the M1 that the Air and 13" MacBook Pro have, so definitely a different (and later iteration) CPU: https://www.apple.com/macbook-pro/
It's the same CPU with a different core distribution.
No, that's incorrect. Here's an image that illustrates the physical differences between the M1, M1 Pro, and M1 Max, including their vastly different die sizes: https://images.anandtech.com/doci/17019/Die-Sizes.jpg
Let's call it an oversimplification, the SoCs are definitely different. OP sounded like they were thinking about MacBook Air and the 13" MacBook Pro, and not taking the newer models into account.
> you still need to shell out $2-3k to get something decent
Honestly 16GB Air is pretty epic for $1200, though you probably want to spend the extra $200 for a storage bump as well. I'm very happy with the performance for dev tasks, and with my (displaylink) dock it runs multiple screens just fine too.
I bought the 16GB air for 1000 and it's easily the best laptop I've ever owned. Fantastic value
Picked up a refurb 16GB air just last week. I'm astounded at the battery life. It's my dream laptop. I wanted to buy a framework for the repairability, but I make some iOS apps and also really wanted the battery life. I've been floored by how long it lasts, even running npm installs and compiling Angular applications, things that used to burn my lap and drain the battery in 4-6 hours on my Intel air.
The macbooks are starting to become somewhat more repairable. The latest one has pull strip adhesive for the batteries which makes user replacement massively easier.
There was a few months back. Now they seem to have more reasonable timetables.
Custom build Mac Studio, on the other hand, takes 10-12 weeks.
> Notice how there is no shortage of chips for iPhones, because Apple just buys the capacity at whatever cost it takes.
Apple bought out the entire capacity of TSMC's 3nm node [1]. I would not be surprised if the deal actually was for Apple to fund the construction of the fab in exchange for this level of priority.
[1] https://www.heise.de/news/Bericht-Apple-schnappt-sich-komple...
> The high value products get priority
So GPUs are not high priority? Because they are out of stock pretty much everywhere too.
It's been pretty easy to buy a 3090 for a while now, and the rest of the 30 series is finally starting to stabilize thankfully
At MSRP? If so, where?
Probably not consumer grade GPUs. NVIDIAs enterprise GPUs are basically the same silicone sold at 10x the price, and bought in batches of 10s to thousands at a time.
I know which SKUs I would be prioritising.
>Won't the new products also simply not be available?
There are shortage in low end, high NM, mature node. This is on 4nm leading node.
Chip production is not completely fungible.
What? They are sold out, not "can't be purchased".
What's the difference? If they are perpetually sold out, then they cannot be purchased.
There is constant production and deliveries being made, just no standing inventory.
Can you enter a queue to purchase them? If not it's just a cat and mouse game to get one.
Direct retail / individual sales are always the least important and the first to get restricted amounts of supply so that large orders can be filled. There is a queue and lots of orders are moving through it, you personally just don't see this.
Depending on the product, volume orders for high-end ICs are typically running between 52 and 72 weeks of lead time at the present, and it's been this way for many months now. So the orders that are getting filled today for parts were placed in early 2021 in most cases.
This is generally very difficult for retailers, because they have had to come up with capital to have a year's worth of orders in the pipeline. So they've been having to stock fewer things -- only what they are absolutely sure will sell -- and can't use real-time sales data to estimate the next month's order.
Welcome to the new normal, it'll be this way for at least another year or two, minimum (until new factories get built plus pre-pandemic levels of productivity, for the most part).
As a consumer you may not be able to, but volume customers and distributors are ordering them and waiting for them.
i guess you dont' know the world of shoes and hypebeast. The primary "value" of the industry is scarcity alone. The sneaker market side of it is worth $100bil alone.
Nvidia is fabless. THey dont make anything. They are primary R&D. This is the fruit.
This is interesting. So without actually targeting a specific Cloud / server market for their CPU, which often ends with a chicken and egg problem with HyperScaler making their own Design or Chip. Nvidia manage to enter the Server CPU market leveraging their GPU and AI workload.
All of a sudden there is real choice of ARM CPU on Server. ( What will happen to Ampere ? ) The LPDDR5X used here will also be the first to come with ECC. And they can cross sell those with Nvidia's ConnectX-7 SmartNICs.
Hopefully it will be price competitive.
Edit: Rather than downvoting may be explain why or what you disagree with ?
AWS Graviton aren't toys, they work pretty well for a wide range of workloads
I wonder if Apple also intends to introduce ECC LPDDR5 on the Mac Pro. Other than additional expansion, I’m struggling to see what else they can add to distinguish it from the Mac Studio.
In order for an Apple Silicon Mac Pro to make any sense whatsoever, its SOC will need have to have support for off-package memory and substantially more PCI-E lanes than the M1 Ultra. Therefore it seems all but certain to me that it will debut the M2 chip family.
Apple isn't going to give up the substantial performance benefits of on-package unified memory in order to support DIMMs. Therefore I predict that we'll see a two-tier memory architecture with the OS making automated decisions based on memory pressure, as well as new APIs to allocate memory with a preference for capacity or performance.
The chassis design is new enough that it was designed with an eventual Apple Silicon Mac Pro in mind, so I expect to see minimal change to the exterior. It might shrink and have fewer slots (particularly since most users won't need a slotted GPU) though I think that's unlikely given that its height and width was defined by 5U rack dimensions.
Currently the performance and power benefits of having tightly packaged RAM are taken full advantage of by the M1 family. A less tightly coupled memory system will likely have significant performance implications. There's a reason why all GDDR memory for GPUs is soldered, as there's signaling issues caused by things like longer traces and the electrical behavior of the sockets themselves.
People also seem often to forget that interconnects are a significant amount of modern power budgets - look at the Epic IO die often using more than the cores in many workloads. It may be the the M1 family looks less attractive when you actually have to add these requirements.
Perhaps there's some possibility of having both a tightly-coupled RAM package and also have an extensible memory system - though that has significant management complexity if you try to treat it like a cache, or likely needs app support if it's some NUMA system where they're mapped separately. But possible, at "just" the cost of the extra memory controller.
I would love to be wrong but I dont think DIMM will be a thing on Mac any more. Not only does it not make economical sense for Apple with additional DDR5 memory controller support and testing, they can now also charge substantial premium for memory.
DIMMs likely won't be a thing anywhere before too long. They're too large and problematic to deal with for the latest memory interfaces (which previously had only been found on GPUs). There's only so much you can miniaturize before the connectors become a real problem - and we can now put an entire "computer" in a single package.
I'm interested to see when the PC form factor goes away completely, likely 1-2 generations of product from now.
More cores and more RAM is really kind of it. I guess PCIe but I’m kind of wondering if they’ll do that.
And more worryingly, will a GPU function in the slots.
The questions everyone has, Ram and GPU.
I think support for GPU compute has reasonable odds. I'd place worse odds on GPU support for back-end graphics rendering or display driving. And of course support for any Nvidia card would continue to have very poor odds.
Heck, we're talking about a company that put an A13 into a monitor. I wouldn't put it past Apple to put an M2 Ultra onto MPX modules and have that GPU/ANE compute performance automatically available through existing APIs. (Would be a great way to bin M2 Ultra chips with a failed CPU core.)
Reading this makes a veteran software developer want to become a scientific researcher.
Way too late for me. I think adding machine learning to my toolbox at least gets me knowledgeable.
https://www.kaggle.com/
When Jensen talks about Transformers, I know what he’s talking about because I follow a lot of talented people.
https://www.kaggle.com/code/odins0n/jax-flax-tf-data-vision-...
> I know what he’s talking about
Robots in disguise?
IKR? Imagine a Beowulf cluster of these...
I don't think you'll have to imagine. It says on the box it's designed for HPC. and every supercomputer in the Top 500 has been a Beowulf cluster for years now.
We call it a "SuperPOD" now apparently.
https://www.nvidia.com/en-us/data-center/dgx-superpod/
Slashdot flashbacks from 2001! Well played. Well played.
Maybe it's just me, but it's just cool to see the CPU market competitive again for the first time since the late 90s.
I wonder why Intel never had a really good go at GPU's? It seems strange, given the demand.
Intel also announced a new GPU offering, supposed to drop in 8 days:
https://www.intel.com/content/www/us/en/architecture-and-tec...
https://en.wikipedia.org/wiki/Intel_Arc
Discrete GPUs have historically been a relatively small and volatile niche compared to CPUs, it's only in the last few years that the market has seen extreme growth.
edit: the market pretty much went from gaming as the primary pillar to gaming + HPC, which makes it far more attractive since you'd expect it to be much less cyclical and less price sensitive. Raja Koduri was hired in late 2017 to work on GPU related stuff, and it seems like the first major products from that effort will be coming out this year. That said, they've obviously had a lot of failures in the acelerator and graphics area (consider Altera) and Koduri has stated on Twitter that Gelsinger is the first CEO to actually treat graphics/HPC as a priority.
CUDA came out in 2007. Wikipedia puts the start of the GPU-driven 'deep learning revolution' in 2012 [1] and people have been putting GPUs into their supercomputers since 2012 as well [2]
I find it strange that Intel has basically just left the entire market to nvidia, despite having 10-15 years warning and running their own GPU division the whole time.
[1] https://en.wikipedia.org/wiki/Deep_learning#Deep_learning_re... [2] https://en.wikipedia.org/wiki/Titan_(supercomputer)
Competing with Nvidia on Gaming GPU wasn't something Intel were keen to do after their failure with i740. The Gaming market wasn't as big, and you are ultimately competing on Driver optimisation, not on actual hardware.
CUDA and Deep Learning may have started in 2007 and 2010. But their usage, or their revenue potential was unclear back then. Even in 2015, Datacenter revenue was less than one eighth of gaming revenue. And rumours of Google AI Processor ( now known as TPU ) started back in 2014 when they started hiring. In 2021, Datacenter is roughly equal to Gaming revenue, and are expected to exceed them in 2022.
Intel sort of knew GPGPU could be a threat by 2016 / 17 already. That is why they started assembling a team, and hired Raja Koduri in late 2017. But as with everything Intel in post Pat Gelsinger era, Intel was late to react. From Smartphone to Foundry Model and now GPGPU.
They created the Xeon Phi[1] for that niche. It was spun out of Larabee[2]. I presume they will be taking advantage of their coming GPU architecture for more going forward.
[1]: https://en.wikipedia.org/wiki/Xeon_Phi
[2]: https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
They tried to check many, some, maybe possibly more of the boxes with the Xeon Phi, and it kinda seems like things simply didn't go their way.
Cuda wasn't as flexible, and the payoff wasn't as big in 2010 or so as it is now.
I've never used a phi, but i can see where they were coming from i think. No need for a full rewrite like Cuda (maybe). The hardware is also more flexible than a GPU, but that turned out to be less important than they thought it might be.
this isn't true. the phi was extremely complex to program for, and it was not simply a port of standard x86 code. it required you to pay attention to multiple levels of memory hierarchy, just as the GPU did.
Intel produced good, as in "cheap and always working", integrated GPUs. For great many tasks, they are adequate. I'm not a gamer, and if I needed to run some ML stuff, my laptop's potential discrete GPU won't be much help anyway.
Also, Intel has a history of producing or commissioning open-source drivers for its GPU. I like the peace of mind I get from knowing I'm not going to have to fight dirty for the privilege of getting my own GPU to do the work I bought it to perform.
Two of the three major GPU vendors have fully-supported open source drivers, arguably it's nvidia being the odd one out rather than anything else.
While I view my Intel iGPU as a backup, I don't have any negative impressions about its performance like many gamers do. I have the 11900K which has an iGPU capable of 720P gaming. Which is quite remarkable to be honest considering it's integrated into my CPU. Cheap and "just works" is exactly how I view it, but they're getting better in the last 2 generations.
I can't find a new dGPU at MSRP so I'm going to see if the Intel Arc cards are more readily available, and if not, I'm probably going to part out my desktop and move permanently to using Intel NUCs. Mostly for the GPU contained within. It seems like the days of getting your hands on a dGPU are over, and I'm not fighting over them.
GPU shortages are nearing an end and with next generation products from Nvidia, AMD and Intel on deck, well probably be in a really good spot for GPU consumers come q4 2022.
It’s been so long now, 2.5 years that I now view GPUs like I do gas prices. You can’t trust in a stable market. It’s not like GPUs didn’t skyrocket in price in the years leading up to the shortage anyway.
Best long term lifetime decision is to get off any dependency for either of them. I’m looking at electric cars and Intel NUCs. A lot of people that I know moved to laptops for the same reason. A lot of us gave up and many like me no longer trust the market.
Besides integrated GPUs for actual graphics usage that other comments mentioned, Intel did make some attempts at the GPGPU market. They had a design for a GPU aimed primarily at GPGPU workloads, Larrabee, that was never released [1], and adapted some of the ideas into Xeon Phi, a more CPU-like chip that was intended to be a competitor to GPUs, which was released but didn't gain a lot of market share [2].
[1] https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
[2] https://en.wikipedia.org/wiki/Xeon_Phi
The space has two competitors, but NVidia makes most of the GPUs and most of the money. If there's barely room for a second player, there's no room for a third. That being said, they are releasing a GPU soon so we'll see how that goes. Unless the market continues to be insane I'm going to guess it won't go over very well.
You're not alone.
What are people's experience of developing with NVIDIA? I know what Linus thinks: https://www.youtube.com/watch?v=iYWzMvlj2RQ
I had a laptop with NVIDIA GPU that crashed Xorg and had to be rebooted whenever Firefox opened WebGL. Just to complement the positive sibling comments :-)
Are you using nvidia's driver or nouveau?
Linus might know his way around UNIX clones and SCM systems, however he doesn't do graphics.
NVidia tooling is the best among all GPU vendors.
CUDA has been polyglot since version 3.0, you get proper IDE and GPGPU debugging tools, and a plethora of libraries for most uses cases one could think of using a GPGPU for.
OpenCL did not fail only because of NVidia not caring, Intel and AMD have hardly done anything with it that could compete on the same tooling level.
I like CUDA, that stuff works and is rewarding to use. The only problem is the tons and tons of hoops one must jump to use it in servers. Because a server with a GPU is so expensive, you can't just rent one and have it running 24x7 if you don't have work for it to do, so you need a serverless or auto-scaling deployment. That increases your development workload. Then there is the matter of renting a server with GPU; that's still a bit of a specialty offering. Until the other day, even major cloud providers (i.e. AWS and Google) offered GPUs only in certain datacenters.
Luckily, you can run CUDA code on even a cheap GTX 1050, so you can test locally and run the full size job on a big V100/A100/H100 system.
I had an Ubuntu 18.04 install that "randomly" started dying (freezing) with my GTX1080 at some point. Pinpointed it to the combination of that GPU + Linux. I didn't want to bother with reconfiguring my WC loop / buying an expensive GPU, so I just gave up and switched to a perfectly stable Windows + WSL.
Nvidia's AI APIs are well documented and supported. That's why everyone uses them.
over the past two decades that I've used nvidia products for opengl and other related things, my experince has been largely positive although I find installing both the dev packages and the runtimes I need to be cumbersome.
soooo... would something like this be a viable option for a non-mac desktop similar to the 'mac studio' ? def seems targeted at the cloud vendors and large labs... but it'd be great to have a box like that which could run linux.
It's viable in the sense that you can just stick a server motherboard inside of a desktop case. It certainly won't be cheap though.
This has been done as a commercial product with the Ampere ARM server chips. The base model is about $8k.
https://store.avantek.co.uk/arm-desktops.html
It’s a server CPU that runs any OS really (Arm SystemReady with UEFI and ACPI).
However, the price tag will be too high for a lot of desktop buyers.
(There are smaller Tegras around though)
It probably won't run Windows. But other operating systems, probably yes. Maybe Microsoft comes up with some sort of Windows Server DC Arm edition in the future so they can join in as well.
Modern Tegras can boot arm64 Windows. But yeah without a licensable Windows Server arm64 SKU, practical uses are quite limited.
Nvidia Orin would be a better fit for an ARM desktop/laptop but Nvidia seemingly isn't interested in that market.
As long as your application workload is a good match for the 144 ARM cores.
Time to sell intel shares?
That time was years and years ago. If you're just thinking about it now, you're already in a world of pain.
Intel stock is up 37% from 5 years ago. Though this past year they took quite a beating.
This is really not that much considering how much every stock has gone up the last couple of years. Nvidia and AMD is up 887% and 737% respectively from 5 years ago.
Given how larger non-mobile chips are jumping to the LPDDR standard what is the point of having a separate DDR standard? Is there something about LPDDR5 that makes upgradable dimms not possible?
AFAIK the higher speed of LPDDR is directly because it avoids signal degradation caused by DIMM connectors.
> Is there something about LPDDR5 that makes upgradable dimms not possible?
It's theoretically possible, but there's no standard for it.
heh, does Intel have any chance to catch up? They fell so far behind.
I really don't see what they can do. It seems like in the last year they pivoted hard into "ok we'll build chips in the US again!", but it's going to be years and years before any of that pays off or even materializes. The only announcements I've heard from them are just regular "Here's the CEO of Intel telling us how he's going to fix Intel" PR blurbs and nothing else. Best case maybe they just position themselves to be bought by Nvidia...
No. Intel worked out it needs to open its production capacity to other vendors. They will end up another ARM fab with a legacy x86-64 business strapped on the side. That's probably not a bad place to be really. I think x86-64 will fizzle out in about a decade.
I don't feel like ARM has serious technical advantages over x86-64 as an ISA, although it is cleaner and has more security features which is good. Isn't the main advantage just that it's easier to license ARM?
Once enough patents expire all ISAs are eventually equal, I'd think.
I feel much the same way. I've used both pretty extensively at this point, and I'm not sure if I'm a believer in either mentality. I'm hoping that RISC-V will be the one to blow my mind, though.
Spend some time looking at optimised compiler output on godbolt on both architectures. ARM has some really nice tricks up its sleeves.
I’ve been using ARM since about 1992 though so I may be biased.
So does Intel ;)
Yes and no. Look at some of the loop optimisations possible on ARM compared to x86-64. I've had x86-64 run 8 instructions that ARM does in 1 instruction.
I remember PPC and its rlwinms and co. My ARM isn’t that good, though I can read it.
But some of those x86 instructions take 0.5 cycles and some of them take 0 if they’re removed by fusion or register renaming. It has worse problems, like loop instructions you can’t actually use but take up the shortest codes.
This applies equally well, or dare I saw even better, on x86. (Arm tends to catch up because of higher IPC.)
Of course they have a chance to catch up. Only a fool would count Intel down & out. Intel is still larger by revenue than AMD, NVidia, and ARM combined.
This will probably cost them some market share, but they have plenty of cash to weather there current manufacturing issues, they still have world-class CPU design talent which they've proven over and over and over again, and they have some very interesting products & technologies on the roadmap.
ARM offering a fight for the first time ever is not going to be a 1-hit KO against the goliath that is Intel.
Intel will never catch up because Arm's business model is much better. Intel is not competing with Arm, they're competing with every large tech company, who are all sharing many design costs via Arm and mostly sharing manufacuring costs via TSMC.
Arm has a much more efficient and also much less profitable business model, and Intel will never catch up unless they adopt it. They'll never do that so they'll fade away like IBM.
The CPUs are designed and made by Apple, the ISA is licensed from ARM. Those are not like ARM Cortex CPUs that are actually designed by ARM.
Where do you see these are Apple designed CPUs? There doesn't seem to be anything indicating that, and that would be massive news.
Err sorry, I thought we were talking about the Apple M1 as in another comment subthread here, but that wasn't this one actually.
But my point still stands I think, isn't this CPU designed by Nvidia, also just with an ARM-licensed ISA? Similarly AMD you mentioned in your list shares its ISA with Intel, and yet the CPUs are completely different.
The CPU is custom but the cores are off-the-shelf Arm cores.
Alright, then my point is indeed gone. I misread the author, thinking I was still in another subthread.
There are some hints that they are redesigning some server processors to double core count but that may not be visible for 2-3 years. Also keep in mind that Intel has 75% server market share and is only losing ~5 points per year.
There is a very good chance that Intel will catch up. They have money, they have capacity, and from what I understand they still have several more designs researched and those will enter production over the next few years. They are also working on RISC-V stuff (AMD is too).
Anyone have a sense for how much these will cost? Is this more akin to the Mac Studio that costs 4k or an A100 gpu that costs upward of 30k? Looking for an order of magnitude.
Considering that the URL is "/data-center/grace-cpu/", assume much more than a Mac Studio.
Compare a 72C Grace against an 80C Ampere Altra which is priced at $4K (without RAM).
The top-end datacenter GPUs have been slowly creeping up from $5k a few generations back to about $15k for the A100's now. So this one will probably continue the trend, probably to $20k or maybe $30k but probably not beyond that.
This is definitely not a consumer-grade device, like a Mac Studio.
Probably on the order of $100k.
That would be a real shame. I really want someone to make a high core count ARM processor in the price range of an AMD threadripper that can work with Nvidia gpus.
Look into Ampere they have 256 core and 160 core dual socket systems for decent prices https://store.avantek.co.uk/ampere.html .
Ampere Altra?
How likely is it that one of AWS / GCP / Azure will deploy these? Nvidia has some relationships there for the A100 chips.
Amazon has at least two generations of their own homebrew ARM chip, the Graviton. They offer it for people to rent and use in AWS, and publicly stated they are rapidly transitioning their internal services to use it too. In my experience Graviton 2 is much cheaper than x86 for typical web workloads--I've seen costs cut by 20-40% with it.
> their own homebrew ARM chip
are they going through TSMC like NVIDIA or are they using Samsung?
AWS has their own CPU. Microsoft is an investor in Ampere, but I am not sure if they will make one themselves or simply buy from Ampere. Google has responded with faster x86 instances, still no hint of their own ARM CPU. But judging from the past I dont think they are going to go with Nvidia.
That is only the CPU though, they might deploy it as Grace + Hopper config.
With names like that, I assume that was the intention
AWS+Azure (and I believe GCP) installed prev advances, and are having huge GPU shortages in general... so probably!
An interesting angle here is these support partitioning even better than in the A100's. AFAICT, the cloud vendors are not yet providing partitioned access, so everyone just exhausts worldwide g4dn capacity for smaller jobs / devs / etc. But partitioning can solve that...
Pretty sure they all will, they all already have the past gens of these things and it's a simple upgrade.
Interesting that this has 7x the cores of a M1 Ultra, but only 25% more memory bandwidth. Those will be some thirsty cores!
M1 Ultra bandwidth is for CPU and GPU (800GB/s). Grace is just the CPU. Hopper, the GPU, has it's own memory and bandwidth (3 TB/sec).
https://twitter.com/benbajarin/status/1506296302971334664?s=...
396MB of on-chip cache… (198MB per die)
That’s a significant part of it too.
The M1 memory bandwidth is mostly for the GPU but Grace does not include an (on-chip) GPU.
The CPU complex on the M1 series doesn't have anything close to the full bandwidth to memory that the SoC has (like, half). The only thing that can drive the full bandwidth is the GPU.
Who bets that the amount of detailed information they'll officially[1] release about it is "none" or close to that? I still think of Torvalds' classic video whenever I hear about nVidia. The last thing the world needs is more proprietary crap that's probably destined to become un-reusable e-waste in less than a decade.
[1]https://news.ycombinator.com/item?id=30550028
> NVIDIA Grace Hopper Superchip
Finally, a computer optimised for COBOL.
So they're adding decimal floating point this generation too then ;)
You win the internet today
I think we're all missing the forest because all the cores are in the way:
The contention on that memory means that only segregated non-cooporative as in not "joint parallel on the same memory atomic" will scale on this hardware better than on a 4-core vanilla Xeon from 2018 per watt.
So you might aswell buy 20 Jetson Nanos and connect them over the network.
Let that sink in... NOTHING is improving at all... there is ZERO point to any hardware that CAN be released for eternity at this point.
Time to learn JavaSE and roll up those sleves... electricity prices are never coming down (in real terms) no matter how high the interest rate.
As for GPUs, I'm calling it now: nothing will dethrone the 1030 in Gflops/W in general and below 30W in particular; DDR4 or DDR5, doesn't matter.
Memory is the latency bottleneck since DDR3.
Please respect the comment on downvote principle. Otherwise you don't really exist; in a quantum physical way anyway.
1030 has been dethroned a while ago. Apple G13 delivers 260GFLOPS/W in a general-purpose GPU. I mean, their phone has more GPU FLOPS than a 1030.
Nope, 1030 has 37 Gflops/W... G13 786/20W = 40... and that's 14nm vs 5nm... still I'm pretty sure there are things the 1030 can do that the A13 will struggle with.
Game Over!
G13 (in the 8-core/1024 ALU config as in M1) delivers 2.6TFLOPS with sustained power consumption of 10W. That's almost an order of magnitude better than 1030. Sure, node definitely matters, but going from 14nm to 5nm cannot explain the massive power efficiency difference alone.
What are the things that 1030 can do that G13 will struggle with?
Your numbers are completely wrong. The claim of 2.6 Tflops for the M1 was independently verified.
https://www.realworldtech.com/forum/?threadid=197759&curpost...
There is not a single page on the whole internet that states Gflops and Watt on the same page, I did 2 googlings: "apple g13 gflops" and "apple g13 watt"... the results where completely disturbing seen this info should be clearly available. Like when you google 1030 gflops and watt you get all links on google linking to pages stating both figures and they are the same...
M1 comes is MANY flavours with different watt and gflops.
And for CPU Glops I had to get friends to measure it themself: 2.5Glops/W compared to Raspberry 4 2Gflops/W and this time it's 5nm vs 28nm.
Please give me official Gflops and Watt sources and we can discuss.
The page you link is NOT clearly stating watts in a clear way.
> What are the things that 1030 can do that G13 will struggle with?
I real life when you develop games for real hardware you notice their real limitations like fill rates and how they scale different behaviours because they have enough registers to do things in one blow or they have to remember things. It's complicated, but eventually you realize you can just benchmark things for your own needs and for me 1030 is for all purposes as good as 1050 so far: 2000 non-instanced animated characters on 1030 at 30W vs like 2500 for 80W 1050!
Without knowing, I'm pretty sure the M1 cannot do more than 1000 at whatever watt it uses... not that I would ever compile anything for a machine where I need to sign the executable.
“Official sources” in this case is testing done by me personally. I am the author of the post on RWT linked previously. I would be happy to share my benchmarking code for your scrutiny if you want. The M1 variant tested was base M1 in a 13” MacBook Pro.
I don’t know what your friends have tested but the results make zero sense. Firestorm reaches 1700 points in GB5 at 5W. Pi 4 is under 300 at similar wattage.
How do you measure watts?
First answer on google: "maximal power consumption is around 50 watts"
Firestorm is GPU (again google has little info) I'm talking CPU for the Raspberry.
The Raspberry 4 GPU uses 1W. You are conflating because of sunk cost.
You need to compare the same things, apples to apples (no pun) one CPU core on Raspberry consumes 1W on the M1 they are 4W
GPU is 1W vs 5W (if you are correct which I HIGHLY doubt, I suspect 20W for the GPU alone, wikipedia states 39 watts at maximum load so yes 20W for the GPU)!
You need to start looking at the world objectively and understand how it really works, because peak energy is not going to be forgiving if you don't.
By using the provided system tools that report power usage of the GPU cluster? Also, I am telling you the system diagnostics output if an actual physical machine. What are you quoting Wikipedia for? You can literally go measure these things. Should I go edit Wikipedia so that you get correct information?
Anyway, power usage of M1 variants has been studied in detail. It’s 5 watts peak for a single performance core, 20W peak for a CPU cluster of four cores, 10W for a 8-core GPU (128 FP32 ALUs per core). Bigger M1 variants have respectively higher power consumption because of the larger interconnects/caches etc. DRAM is also a factor. Running at full bandwidth is can consume over 10W of power.
Firestorm is not a GPU, it's a microarchitecture for the large cores on M1.
Well, it seems like you made up your mind without doing any testing or educating yourself, so I am not quite sure what I can do to help you. Already your entirely nonsensical comment of “needing to sign the executable” speaks volumes. Why did I never need to sign anything despite building software on M1 machines daily for the last year? I wonder…
All code that runs on Apple silicon must be signed. If you don't explicitly sign your executable, the linker will inject an ad-hoc signature into your binary for you.
Sure. But it does not affect you as developer in any form or fashion. It's just a thing the linker does. You can still distribute your binaries, disassemble them etc. as you ever would.
I wouldn’t say it’s all over. People have been saying that it’s all over for longer than I can remember, and there is always someone smarter and more clever. The GPU space is ripe for disruption, the memory space is ripe for disruption, and the CPU space is being disrupted presently. For all I know, some genius has it worked out now and is going to launch a new startup sometime this month.
Aren't you are ignoring use cases where all cores read shared data, but rarely contentiously write to it. You should get much more read bandwidth and latency than over a network.
Sure, but my point is: why cram more and more cores into the same SoC if they can't talk to each other more efficiently than separate computers over ethernet?
This point feels like arguing why any organization would seek density in computing if they can just buy more of something and spread it out. I don't know about you but I've saved a ton of effort on design complexity by not distributing workloads when it can be avoided (but distributed computing is a solved problem).
I recognize what you are calling out/that performance will be the same on some workloads if you distribute or not. I would just point out less manufacturing causes less e-waste/I would rather live in a world where Nvidia sells 50 million 10*0 cards, than 500 million 1030 cards to create the same amount of compute in the world. It's not just the power costs to consider (but it could be there is a reality where running 500 million 1030s for their lifetime wastes so much less power, that the manufacturing costs to the planet are worth it).
Your point is rooted in wrong facts. On-chip fabrics are much more efficient than separate computers over ethernet. More energy efficient and lower latency.
Not only that. On-chip gives you high precision synchronous time (all cores observe the same time) so you can use synchronous distributed algorithms that are unsuitable for ethernet networks.
This type of hardware allows for much better solutions to some problems.
Latency! Nanoseconds versus microseconds or even milliseconds makes a huge difference.
Performance per watt isn’t so useful for a GPU. People training ML algorithms would gladly increase power consumption if they could train larger models or train models faster.
And that's exactly my point: they can't. Power does not solve contention and latency! It's over, permanently... (or atleast until some photon/quantum alternative, which honestly we don't have the energy to imagine, let alone manufacture, anymore)
"Grace?"
After 13 microarchitectures given the last names of historical figures, it's really weird to use someone's first name. Interesting that Anandtech and Wikipedia are both calling it Hopper. What on Earth are the marketing bros thinking?
The GPU is Hopper, which is in line with their naming scheme up till now. The CPU is call Grace. Clearly they are planning to continue the tradition of naming their architectures after famous scientists and the CPUs will take on the first name while the GPU will continue to use last.
So expect a future Einstein GPU to come with a matching Albert CPU.
They also made the “Hopper” architecture to complement it.