Contrasting Intel AMX and Apple AMX

corsix 3 years ago

The elephant in the room contrast: Apple has been shipping this for years, whereas Intel _might_ ship theirs in something later this year.

blinkingled 3 years ago

> Note that these instructions are neither documented nor supported by Apple.
What does that mean? Someone pointed out below Apple ships accelerate framework as a higher level supported mechanism to use these instructions?
Intel/AMD have been always good at documenting most of their stuff so perhaps we will see proper supported ones whenever Intel ships it.
- fredoralive 3 years ago
  
  I think they mean that using the raw instructions is unsupported and undocumented, officially you have to use the libraries as the interface to them.
  
  cyber_kinetist 3 years ago
  
  Though in reality nobody is stopping you from using those undocumented instructions. If Apple’s Accelerate framework can use it so do you. (Or is it against the EULA?…) There is a slim chance that the behavior of these instructions might change with a software (if Apple is doing any kind of microcode update), but I kinda doubt it.
maven29 3 years ago

They've had something similar off-core with GNA since 2019 10th gen Ice Lake for low-power always-on inference use-cases.
smoldesu 3 years ago

Hasn't Intel been shipping scalar instructions for more than a decade?
- colejohnson66 3 years ago
  
  You’re probably thinking of AVX, Advanced Vector Extensions. That’s a family of vector extensions that has been out for just over a decade (2011). This is about AMX, Advanced Matrix Extensions. AMX is, as the name implies, built around 2D INT8/BF16 matrices (not “1D” FP16/32/64 vectors). It’s not on silicon available to the public.
  
  xattt 3 years ago
  
  Also not to be be confused with VMX (AltiVec/Velocity Engine), which predates AVX by another decade or so.
  To keep the record straight, we have: 1. AMX, 2. AVX, and 3. VMX.
  
  colejohnson66 3 years ago
  
  There’s also Intel VMX, the Virtual Machine Extensions ;) Sometimes it’s referred to as VT-x.
- timdorr 3 years ago
  
  Intel's AMX specifically hasn't yet shipped, but will be coming with their new Sapphire Rapids Xeon chips this year.
  
  mhh__ 3 years ago
  
  *Next year (potentially).
  Sapphire rapids might well end up only shipping in Aurora and then being replaced immediately by it's successor.

Someone 3 years ago

The article would have been much better for me if it drew conclusions about the usefulness of the two (partial) instruction sets.

What can you do easily and what’s hard?

I also expected to read something about relative performance of the two.

stephencanon 3 years ago

The main obvious thing is that Intel simply doesn’t have fp32 or fp64 support. If you primarily care about ML inference, that mostly doesn’t matter to you; if you care about other types of dense computation as well, it matters a lot.
corsix 3 years ago

Performance comparison is hard given that the Intel one hasn’t shipped yet.
- dougall 3 years ago
  
  Yeah, I think Intel's only number so far is "2048 int8 operations/cycle/core" (as opposed to VNNI's 256): https://www.servethehome.com/wp-content/uploads/2021/09/Inte...
  Which (assuming 1 multiply-add = 2 operations) is the same int8 operations/cycle as the Apple M1's float16 operations/cycle. Intel's 16-bit operations might be the same rate, but I'd guess half? That'll almost certainly be at a higher clock-speed, and one-per-core rather than one-per-four-P-cores. (And I think Apple might have doubled their throughput in M2. As you said, performance comparison is hard.)

enzanki_ars 3 years ago

Seems to be related to https://news.ycombinator.com/item?id=32722510

altairprime 3 years ago

Is the Apple AMX genlut op approximately the same as =HLOOKUP()?

keepquestioning 3 years ago

Can AMX be used in User mode?

fathyb 3 years ago

Apple provides the Accelerate framework [0] to access these instructions through high-level APIs. It is available in user-mode, and I doubt it switches to kernel mode although I didn't check.
[0]: https://developer.apple.com/documentation/accelerate