Incrementing Vectors (2019)

travisdowns.github.io

48 points by luu 3 years ago

cocoto 3 years ago

Serious question, why would you write the increment function differently than this simple one below?

void vector8_inc(std::vector<uint8_t> &v) { for (uint8_t &x : v) { ++x; } }

No need to deal with indices or pointers, no obfuscated code, no additional variables, only one clearly stated side effect (we don't need to update a local variable), reads like simple English. I personally don't like C++ (I prefer C) but why would you write C like code in C++?

Gupie 3 years ago

You wouldn't. The article itself says:
"What about the range-based for loop introduced in C++11?
Good news! It works fine. It is syntactic sugar for almost exactly the iterator-based version above, with the end pointer captured once before the loop, and so it performs equivalently."
nkurz 3 years ago

Seems like a fair question. At a glance, it produces the same code as Travis's preferred solution: https://godbolt.org/z/7oPhxEs5E. Anyone with more taste in C++ code able to answer? I'm more comfortable with C than C++, but this would be my preferred answer (if somehow knew in advance that it was going to produce good code).
- BeeOnRope 3 years ago
  
  It's mentioned in the article (look for "What about the range-based for loop"): it works fine because it uses iterators under the covers which as local objects avoid the specific problem which occurs here.
  This article isn't really meant to highlight something problematic about vectors in particular, but with C and C++ in general: that we actually rely on aliasing-based optimization in many cases and this optimization is fragile.
  The basic operation here is simply indexing into an array stored in a struct which is quite common, even though you of course should prefer the range-based for this minimal example.
amag 3 years ago

You wouldn't if all you want is to just increment all elements in a std::vector but for a more complex loop you might want the index...

noobermin 3 years ago

Sheesh, the ability to alias char to anything has some serious consequences on optimization, huh.

It's unfortunate, because what is likely an obscure case (where the user of the function call intends to alias things in the very container they are using) is a minority use case but it ruins the situation for most other code.

im3w1l 3 years ago

There are multiple possible lessons to take from this. On the one hand you could say that "aliasing is the issue, clearly things should not be just allowed to alias each other willy-nilly". Or on the other hand you could come away thinking "I always mistrusted optimizers! They are just too brittle and unpredictable! Placating them is a never ending treadmill. I'll manually vectorize key parts"

Twisol 3 years ago

Both. Both is good.

684577567345 3 years ago

I could not reproduce the authors results on clang 8.0 or gcc 8.1 at O3. The uint8_t version performed roughly 6 times faster than the uint32_t version. The post was published in 2019 so perhaps something has changed since then.

https://quick-bench.com/q/93vPCntI82FHbwAMv1ODUd_FjPc

TinkersW 3 years ago

Your vectors are local to the function, so not the same thing, trivial for compiler to know they aren't aliased.
yarrrrrrrrrrrr 3 years ago

https://quick-bench.com/q/x9kqob0PB5FsWQ65tFgpOlpaWOA
Repro. As the other commenter said, the internal data pointers need to be of unknown provenance (e.g. non-local).

dmitrygr 3 years ago

And that is why C has the "restrict" keyword, which you should use in precisely such cases

Joker_vD 3 years ago

Of course, if you accidentally pass aliased pointers as restricted pointer arguments to a function, "no diagnostic is required".
noobermin 3 years ago

Which is a shame because restrict is not in C++
- nahstra 3 years ago
  
  If it's not in the standard, but it's in every C++ compiler, is it really not in C++? Semantics

fartsucker69 3 years ago

Wonder how a manually written vectorized 8 bit version would compare (with some padding inserted at the end of the array to avoid extra checks). Work size/bandwidth in a small test like this shouldn't matter, prefetching/write combining should be essentially perfect and bandwidth won't show up until it's a bottleneck/we test throughput.

Besides the lacking optimization due to aliasing, could there also be some microcode iffyness be going on? Can an x86/64 cpu just increment an 1 byte value positioned at a random memory location?

MauranKilom 3 years ago

As the article notes, range-based for loops side-step this problem.

But the general problem (writing to a char* of unknown provenance might force the compiler to reload any other pointer reads) remains, of course.