cocoto 9 days ago

Serious question, why would you write the increment function differently than this simple one below?

void vector8_inc(std::vector<uint8_t> &v) { for (uint8_t &x : v) { ++x; } }

No need to deal with indices or pointers, no obfuscated code, no additional variables, only one clearly stated side effect (we don't need to update a local variable), reads like simple English. I personally don't like C++ (I prefer C) but why would you write C like code in C++?

  • Gupie 9 days ago

    You wouldn't. The article itself says:

    "What about the range-based for loop introduced in C++11?

    Good news! It works fine. It is syntactic sugar for almost exactly the iterator-based version above, with the end pointer captured once before the loop, and so it performs equivalently."

  • nkurz 9 days ago

    Seems like a fair question. At a glance, it produces the same code as Travis's preferred solution: https://godbolt.org/z/7oPhxEs5E. Anyone with more taste in C++ code able to answer? I'm more comfortable with C than C++, but this would be my preferred answer (if somehow knew in advance that it was going to produce good code).

    • BeeOnRope 9 days ago

      It's mentioned in the article (look for "What about the range-based for loop"): it works fine because it uses iterators under the covers which as local objects avoid the specific problem which occurs here.

      This article isn't really meant to highlight something problematic about vectors in particular, but with C and C++ in general: that we actually rely on aliasing-based optimization in many cases and this optimization is fragile.

      The basic operation here is simply indexing into an array stored in a struct which is quite common, even though you of course should prefer the range-based for this minimal example.

  • amag 9 days ago

    You wouldn't if all you want is to just increment all elements in a std::vector but for a more complex loop you might want the index...

noobermin 10 days ago

Sheesh, the ability to alias char to anything has some serious consequences on optimization, huh.

It's unfortunate, because what is likely an obscure case (where the user of the function call intends to alias things in the very container they are using) is a minority use case but it ruins the situation for most other code.

im3w1l 9 days ago

There are multiple possible lessons to take from this. On the one hand you could say that "aliasing is the issue, clearly things should not be just allowed to alias each other willy-nilly". Or on the other hand you could come away thinking "I always mistrusted optimizers! They are just too brittle and unpredictable! Placating them is a never ending treadmill. I'll manually vectorize key parts"

  • Twisol 9 days ago

    Both. Both is good.

684577567345 10 days ago

I could not reproduce the authors results on clang 8.0 or gcc 8.1 at O3. The uint8_t version performed roughly 6 times faster than the uint32_t version. The post was published in 2019 so perhaps something has changed since then.

https://quick-bench.com/q/93vPCntI82FHbwAMv1ODUd_FjPc

dmitrygr 10 days ago

And that is why C has the "restrict" keyword, which you should use in precisely such cases

  • Joker_vD 10 days ago

    Of course, if you accidentally pass aliased pointers as restricted pointer arguments to a function, "no diagnostic is required".

  • noobermin 10 days ago

    Which is a shame because restrict is not in C++

    • nahstra 9 days ago

      If it's not in the standard, but it's in every C++ compiler, is it really not in C++? Semantics

fartsucker69 9 days ago

Wonder how a manually written vectorized 8 bit version would compare (with some padding inserted at the end of the array to avoid extra checks). Work size/bandwidth in a small test like this shouldn't matter, prefetching/write combining should be essentially perfect and bandwidth won't show up until it's a bottleneck/we test throughput.

Besides the lacking optimization due to aliasing, could there also be some microcode iffyness be going on? Can an x86/64 cpu just increment an 1 byte value positioned at a random memory location?

MauranKilom 10 days ago

As the article notes, range-based for loops side-step this problem.

But the general problem (writing to a char* of unknown provenance might force the compiler to reload any other pointer reads) remains, of course.