corsix 2 years ago

The elephant in the room contrast: Apple has been shipping this for years, whereas Intel _might_ ship theirs in something later this year.

  • blinkingled 2 years ago

    > Note that these instructions are neither documented nor supported by Apple.

    What does that mean? Someone pointed out below Apple ships accelerate framework as a higher level supported mechanism to use these instructions?

    Intel/AMD have been always good at documenting most of their stuff so perhaps we will see proper supported ones whenever Intel ships it.

    • fredoralive 2 years ago

      I think they mean that using the raw instructions is unsupported and undocumented, officially you have to use the libraries as the interface to them.

      • cyber_kinetist 2 years ago

        Though in reality nobody is stopping you from using those undocumented instructions. If Apple’s Accelerate framework can use it so do you. (Or is it against the EULA?…) There is a slim chance that the behavior of these instructions might change with a software (if Apple is doing any kind of microcode update), but I kinda doubt it.

  • maven29 2 years ago

    They've had something similar off-core with GNA since 2019 10th gen Ice Lake for low-power always-on inference use-cases.

  • smoldesu 2 years ago

    Hasn't Intel been shipping scalar instructions for more than a decade?

    • colejohnson66 2 years ago

      You’re probably thinking of AVX, Advanced Vector Extensions. That’s a family of vector extensions that has been out for just over a decade (2011). This is about AMX, Advanced Matrix Extensions. AMX is, as the name implies, built around 2D INT8/BF16 matrices (not “1D” FP16/32/64 vectors). It’s not on silicon available to the public.

      • xattt 2 years ago

        Also not to be be confused with VMX (AltiVec/Velocity Engine), which predates AVX by another decade or so.

        To keep the record straight, we have: 1. AMX, 2. AVX, and 3. VMX.

        • colejohnson66 2 years ago

          There’s also Intel VMX, the Virtual Machine Extensions ;) Sometimes it’s referred to as VT-x.

    • timdorr 2 years ago

      Intel's AMX specifically hasn't yet shipped, but will be coming with their new Sapphire Rapids Xeon chips this year.

      • mhh__ 2 years ago

        *Next year (potentially).

        Sapphire rapids might well end up only shipping in Aurora and then being replaced immediately by it's successor.

Someone 2 years ago

The article would have been much better for me if it drew conclusions about the usefulness of the two (partial) instruction sets.

What can you do easily and what’s hard?

I also expected to read something about relative performance of the two.

  • stephencanon 2 years ago

    The main obvious thing is that Intel simply doesn’t have fp32 or fp64 support. If you primarily care about ML inference, that mostly doesn’t matter to you; if you care about other types of dense computation as well, it matters a lot.

  • corsix 2 years ago

    Performance comparison is hard given that the Intel one hasn’t shipped yet.

    • dougall 2 years ago

      Yeah, I think Intel's only number so far is "2048 int8 operations/cycle/core" (as opposed to VNNI's 256): https://www.servethehome.com/wp-content/uploads/2021/09/Inte...

      Which (assuming 1 multiply-add = 2 operations) is the same int8 operations/cycle as the Apple M1's float16 operations/cycle. Intel's 16-bit operations might be the same rate, but I'd guess half? That'll almost certainly be at a higher clock-speed, and one-per-core rather than one-per-four-P-cores. (And I think Apple might have doubled their throughput in M2. As you said, performance comparison is hard.)

altairprime 2 years ago

Is the Apple AMX genlut op approximately the same as =HLOOKUP()?