Stepanov's biggest blunder? The curious case of adjacent difference

73 points by signa11 5 days ago

partial_sum has the same problems (eg when T - T isn't T) and would be more generic if it took the initial accumulator as a separate parameter (and then it and a more generic adjacent_difference would be symmetrical again)

kragen 20 hours ago
I agree that that would solve the symmetry problem, and it would allow the type of the items in the generated sequence to be different from the input sequence. However, the problem is much less severe for partial_sum, because in most cases, it is possible to add two items of the relevant type.
The concept of "torsors" seems relevant here: https://math.ucr.edu/home/baez//torsors.html
> We can express this in terms of torsors as follows: energy differences lie in the group of real numbers R, but energies themselves do not: they lie in an "R-torsor".
So in the date example, we have a set T of intervals like "three days" which we can add to members of the set D of datetimes to get new elements of D, which is the torsor of T:
```
    D + T : D
```
(I always forget which one is the torsor. It's the one that doesn't have an identity, but I haven't yet said the identity of what.)
And we can subtract two D elements to get the T from one to the other:
```
    D - D : T
```
Which we define with the property
```
    d1 + (d2 - d1) = d2
```
This is sufficient to define your suggested versions of the algorithms, which are, as you say, inverses.
```
    partial_sum : (D, T[n]) -> D[n+1]
    adjacent_difference : D[n+1] -> (D, T[n])
```
That is, partial_sum is a function from a torsor element, and a sequence of n elements of the set T of deltas on that torsor set, to a sequence of n+1 torsor elements, and vice versa for adjacent_difference.
Now, we can try to define an interval addition operation:
```
    T + T : T
```
If we have two intervals t1 and t2, their sum is:
```
    t1 + t2 = ((d1 + t1) + t2) - d1
```
But nothing above guarantees that this result is independent of d1! And I suspect that a real-world example might be lurking in calendar math, where you might support intervals of "±n days" and "±m months", and an interval like "+2 months" might be realized as a varying number of days depending on month lengths, but never returned from the datetime-subtraction function.
In such a case your chosen set of intervals might not contain an element that could correctly represent "+2 months" + "+7 days". With different choices of d1, you might get a sum of "+67 days", "+66 days", "+69 days", etc.
The STL prefix_sum can't handle that, because it needs T + T to be well-defined. But your more-generic version takes a starting D element, so it can handle it fine.
However, I think it's fairly unusual to use such unruly data types with these algorithms. Torsors are defined, normally, on groups, so addition on T is guaranteed to not just exist, but have an identity and an inverse for every element, and be associative. And in such cases the STL's T[n] -> T[n] prefix_sum is in some sense "just as generic", because you can freely transform between D[n] and T[n] by adding or subtracting a predetermined "epoch date" or "freezing point".
——*——
There are some lovely wildflowers here if you have the patience for rambling off the trail a bit! The associativity of group operations gives rise to an efficient parallel algorithm for partial_sum (prefix sum) closely related to the parallel sum algorithm that Stepanov says originally inspired his work on generic algorithms when he realized that it was applicable to any monoid.
Often parallel algorithms correspond closely with incremental algorithms; that O(lg N) sum algorithm applicable to general monoids corresponds closely with an O(lg N) incremental sum update algorithm applicable to general monoids, where you sift updates up a binary tree from the changed input to the root. But for abelian groups like integer addition, there's an O(1) update algorithm: subtract the old value from the new, and add that difference to the sum.
However, the O(lg N) version is applicable to other monoids, such as max and min, which yields efficient incremental algorithms for problems like morphological dilation and erosion, where prefix sums are not applicable.

fph 2 days ago

I think it's also because C++ has no generic concept of "zero"; otherwise one could have defined the first element of adjacent_difference(v) as v(1)- zero<typeof(v)>, and it would have been type-stable.

akoboldfrying 2 days ago

I think that would fix the issue at the type level (up to a point; for unsigned types, the "correct" type for the result of a subtraction is underdetermined -- it could be any of {same type, larger signed type, same-size signed type} depending on the circumstances).
But I think the more serious niggle is the fact that that first element shows up in the output at all. OTOH, I suppose you could write a discard_first iterator adaptor that ignores the first write and increment and passes the rest through to the underlying output iterator.
plorkyeran 2 days ago

T{} is now sort of that, but it didn't exist yet when adjacent_difference was written.
- tialaramex 2 days ago
  
  Sort of. T {} is IIUC C++ "zero initialization" but this doesn't actually promise to give us the zero (additive identity) of T but instead to basically smash all T's constituent elements to zero which may not have that effect. This does what you meant for simple types like the signed integers or floats at least.
nurettin 19 hours ago

They could have added an init parameter if int() or double() isn't enough

AnotherGoodName a day ago

std::vector<bool> is the actual biggest stl blunder fwiw.

It’s a overridden specialisation of vector that only occurs when you use bool as the type. It tries to store the bools in a bit array which completely breaks various vector and container functionality.

Universally agreed as a blunder and a great example of premature optimisation. Herbs commentary on it: http://www.gotw.ca/publications/N1185.pdf

gpderetta 18 hours ago

Was vector<bool> from the original STL or is it a committee invention?
- tialaramex 14 hours ago
  
  It does look like the SGI STL from the 1990s has the std::vector<bool> mistake
  https://github.com/uatuko/sgi-stl/blob/master/lib/stl_bvecto...
  At first I thought OK there's no sign of it, and then I realised it'd be a specialisation in a separate part of the codebase and sure enough it's there, it's really obvious in hindsight that this just isn't the same type, it's all separate implementation, but hey that's a specialisation right ?
  I suppose it's possible this is a result of committee feedback and was not in the "original" STL from even before that ?
  
  Joker_vD 12 hours ago
  
  I personally think it's the influence from Pascal, it had the "SET OF" types that were conventionally implemented as bitsets, so you didn't have to muck around with manual shifting/masking yourself.
  
  AnotherGoodName 11 hours ago
  
  This overridden vector should just simply have been named std::bitset and there’d be no issues. As in having a bitset is great. The overriding of vector<bool> was not.
  In fact it taught me a good lesson when i saw it. If you are overriding a templated type with a specialisation in c++ generally instead you should just create a new type rather than do that specialisation. Changes nothing in the compiled code to do this and avoids confusion from unexpected behaviours of the specialisation.
  As in a new class named std::bitset would have exactly the same code as this specialisation, compiled to the same thing, wouldn’t break the container, vector and iterator contracts, etc. This is pretty much always true for specialisations of templated code too. Just make a new class with a new name.
  
  Joker_vD 10 hours ago
  
  You're correct, of course, but back in 1990 people were way more excited about overgeneralizations, class hierarchies and what templates could possibly do. I mean, that was the time when using << and non-stacking stream manipulators seemed to be a good design for formatted output.
  
  tialaramex 9 hours ago
  
  The C++ << stream operator is Bjarne's fault. It flies so very obviously in the face of the language's explicit instructions not to use overloading to give the same operator a different meaning, and the only reason it would be in the stdlib is that it's Bjarne's baby from the outset. He wrote the initial implementation and Bjarne's explanation for why this isn't a distinct operator appeals to Bjarne's own decision not to allow new operators in his language - basically "Because I said so".
  I'm actually sympathetic to the stated goals of overloading, providing harmonic implementations of operators for custom types. If it makes sense to add a Goose to another Goose, a good language should let me use the + operator like on integers.
  Everything beyond this I don't like, whether that's the (common) abuse of + to mean "concatenate" for strings, or the perverse use of << and >> in C++ streams. If your idea deserves an operator, give it a new operator, don't steal an existing one, that's my position.
  
  Joker_vD 8 hours ago
  
  Well, IIRC using the >>/<< operators for I/O was proposed by McIlroy (who also invented the Unix pipes), and Stroustrup liked it enough to adopt. And, well, I personally don't really mind it.
  What I do mind is the stupid design of stream manipulators. First of all, their effect is not scoped until the end of the statement, so e.g.
  out << std::hex << flags; // ... out << decimal_number;
  will print the second number as hex as well which is normally not what you want. So you either need to manually set all the properties you care about all the time:
  out << std::hex << std::setw(0) << flags; // ... out << std::dec << std::setw(0) << decimal_number;
  or store the current flags of the stream before you do any output and restore them when you're done ― and both of those things kinda ruin the whole ergonomics which already were pretty bad compared to simple "printf("%x", flags); printf("%d", decimal_number)".
  And the second, IMHO bigger, problem is that it's the std::ios_base that actually does all of the formatting and consults all of the formatting flags that are stored inside of it, and it's usually implemented in such a way that adding your own custom formatters/manipulators is basically impossible, not that you'd really want to do it anyway: making the custom formatter a wrapper around your data has minimal overhead and usually you can afford it. Adding a new formatting flag to the std::basic_iostream? Yeah, forget it.
  And of course there is also the whole host of good()/bad()/fail()/eof() states and the "convert-to-bool" operator with very questionable semantics that for some stupid reason is not just an alias for good().
  
  tialaramex 5 hours ago
  
  Stroustrup's description saya that "The idea of providing an output operator rather than a named output function was suggested by Doug McIlroy" but doesn't specify that Doug asked for this weird operator re-use which is the part I object to.
  I agree that the manipulators are awful. I don't think you could "scope" the manipulators in the way you want without changing C++ because the compiler would need to tell the stream "Hey, I'm done now" and that's not how expressions work in C++, there's no "done" step when the expression's result isn't further used.
  Because it has compile time reflection C++ 26 will probably make it practical for more C++ libraries to grant their types interesting formatting (with std::format and related modern stuff, not I/O Streams), and it will be educational to find out whether the result is a Cambrian explosion of interesting new ideas nobody can live without by 2040 or a sprawling disaster that is regretted in similar threads to this ten years from now.
  This is a trick Rust doesn't get to do - unlike std::format, Rust's formatting rules are carved in stone, your type can be formatted only in pre-determined ways, if in a few years every C++ Goose type provides {:honk} and {:flap} which every Goose user intuitively understands then too bad the Rust Goose libraries can't do that, they're reduced to abusing existing features such as {:#} (Rust's alternate display flag) and hoping that's enough. To "fix" that the Rust language itself would need to revise the entire Rust formatter feature, whereas to do it in C++ just took writing some hairy constexpr code, expert mode for sure but hardly beyond possibility for the library maintainer.

getnormality 2 days ago

For anyone wanting to go deeper, Knuth's Concrete Mathematics covers the discrete calculus topics mentioned here (and much more).

pizza 2 days ago
This is also known as "The Fundamental Theorem of Stream Calculus" in stream calculus. Using coinduction for an (infinite) stream sigma, eg
```
    sigma(0)      = head(sigma)
    sigma'        = tail(sigma)
    (a ++ sigma)' = sigma
```

wowczarek a day ago

As someone who's done very little with C++ - not enough to explore what std:: has to offer today anyway - but a lot with C and with time and frequency, looking past the quirk and the typing consequences, all I could see was "oh look, free Allan Deviation!".