bheadmaster 2 years ago

Slight nitpick - the definition of "race condition" on Wikipedia [0] is:

    [...] the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events
If we take the first example - Parallel BFS - the correctness of the output could be considered "system's substantive behavior". Properly implemented atomic checks (as demonstrated) would still guarantee to lead to correct output in all possible combinations of events. Therefore, the system's "substantive behavior" is not dependent on the sequence or timing of other uncontrollable events. Therefore, there is no "race condition" involved.

Of course, the term "race condition" here is taken colloquially, for the sake of familiarity to the reader - the article has correctly recognized that the appropriate term for this kind of behavior is "non-determinism".

[0] https://en.wikipedia.org/wiki/Race_condition

  • chrisseaton 2 years ago

    I don't know where Wikipedia got this 'substantive behaviour' requirement from, but for example it explicitly isn't part of the definition in the industry's definitive reference for parallelism - Padua.

    > A race condition occurs in a parallel program execution when two or more threads access a common resource, e.g., a variable in shared memory, and the order of the accesses depends on the timing, i.e., the progress of individual threads.

    > The disposition for a race condition is in the parallel program. In different executions of the same program on the same input access events that constitute a race can occur in different order, which may but does not generally result in different program behaviors (non-determinacy).

    Sometimes you deliberately program a full-on data race (which isn't a bug by definition, as the article says) for performance reasons.

    > Data races that are not manifestations of program bugs are called benign data races.

    • ajross 2 years ago

      That definition seems to conflate "determinism" (something largely impossible to achieve in asynchronously parallel systems) with "correctness" (an abstracted property of a system that doesn't have anything to do with determinism per se). It just doesn't seem useful.

      No, in overwhelmingly common usage, programmers use the term "race condition" as a category of software bug. We mean it in the correctness sense, not the one used in the linked article nor your reference. You'd be met with some very weird stares if you tried to explain how arbitrary SMP ordering of log entries or whatever was a "race condition".

      • bheadmaster 2 years ago

        > No, in overwhelmingly common usage, programmers use the term "race condition" as a category of software bug

        This was exactly my point.

            race condition = incorrect behavior
            non-determinism = correct behavior
        
        Language is tricky sometimes.
      • chrisseaton 2 years ago

        > That definition seems to conflate "determinism" (something largely impossible to achieve in asynchronously parallel systems)

        There are plenty of examples of entirely deterministic parallel models. Fork-join for one.

        • ajross 2 years ago

          I think you missed the point. Let me expand.

          I mean, yes, what you say is true, but it just amounts to saying "synchronization is a solvable problem". At their core, ALL synchronization paradigms (spin polling, interrupt masking, OS-managed process suspend, hardware memory barriers, weird lockless tricks like Dekker's algorithm, you name it) can be understood to be ways of enforcing "para-determinism" on environments that don't provide it.

          In SMP, things don't happen in reliable orders. They don't. They never will. They can't. But your hardware and OS platforms are smart and provide trickery so that at least you can guarantee that "some" things happen in reliable orders. And then you construct software such that correct behavior is dependent only on that subset of ordering and not everything.

          Because there is no determinism in SMP, only that which you construct.

          • chrisseaton 2 years ago

            If you can't observe the non-determinism, then is it really non-determinism?

            Your processor executes a single thread of instructions also in a non-deterministic order, based on complex internal state. We'd never say it was non-deterministic, as you can't detect it.

            • ajross 2 years ago

              You can absolutely observe non-determinism. That's what a race condition bug is, after all. You can also observe benign nondeterminism every day in anything that records ordering. Go do a parallel make, then do it again and observe the ordering of the mtime values in the intermediate artifacts. It won't be the same.

              And FWIW: CPU-internal instruction reordering is observable too, though the details there get complicated. x86 hides (almost!) all the complexity from you, but ARM's OOO is leakier and requires careful attention to memory barriers.

            • smallnamespace 2 years ago

              It's visible from the perspective of the software engineer who molds non-deterministic abstractions into predictable interactions for the user.

    • User23 2 years ago

      It’s worth noting that algorithms can be designed correct under nondeterministic execution. For example, quicksort is correct with a randomly selected pivot. And for associative and commutative functions it doesn’t matter what order they’re executed in the final result is always the same.

      Dijkstra’s guarded commands don’t specify an order for the conditional. The semantics is that the process is free to execute any one of the cases that has a true guard. Nevertheless he found them useful for developing and describing many algorithms.

      • tonyarkles 2 years ago

        > And for associative and commutative functions it doesn’t matter what order they’re executed in the final result is always the same.

        Not directed at you specifically, but just a reminder for anyone who hasn’t been burned by it yet: floating point addition is not associative. (a+b)+c != a+(b+c). It’s close, but if you’re not careful you can get bad results where the accumulated small errors turn into wrong answers.

        • tialaramex 2 years ago

          Not just floating point maths either. The built-in "integers" in your computer don't behave like the integers taught in school either but that's how we tend to think about them. This is why I argue trapping is the most acceptable choice for arithmetic problems in general purpose languages, if the program tries to add 150 to 150 in an unsigned 8-bit integer, that's a programming error. By all means offer wrapping arithmetic, sometimes it's what the programmer actually wanted and they benefit from being able to clarify that - but most often they didn't want that and would be astonished that 150 + 150 = 44.

          • kaba0 2 years ago

            It can overflow, but integer math is associative and commutative still, isn’t? It’s just modulo the biggest representable number (if unsigned).

      • ajross 2 years ago

        > It’s worth noting that algorithms can be designed correct under nondeterministic execution.

        In fact the whole concept of "symmetric multiprocessing" demands it.

    • remram 2 years ago

      By this definition, every access to a shared resource is a race condition, e.g. even when properly acquiring a lock. It is common knowledge that you introduce locks to remove race conditions so I would say something is definitely missing from the definition.

      • asveikau 2 years ago

        I think "remove" the race condition is the wrong word. It should be "resolve" the race conditions.

        The race exists because there are multiple accesses. This is resolved when there is a protocol for deciding who proceeds and who waits for the other.

      • chrisseaton 2 years ago

        Usually you’re adding locks not to remove the race condition but to make them not a bug.

        • remram 2 years ago

          That is a valid definition but it's not the one used by everyone I know and work with. A race condition is a fault, just like a SQL injection; it could easily have referred to a programming technique but it doesn't.

    • bheadmaster 2 years ago

      Hey, I've just downloaded PADUA (http://dx.doi.org/10.1145/2633685), and skimming through it, I can't find a single mention of the phrase "race condition".

      Is this the paper you're referring to? If not, could you please provide a reference to which PADUA you're referring to? I'd really like to read more on the subject, especially if the source is, as you claim, an industry reference.

      • chrisseaton 2 years ago
        • bheadmaster 2 years ago

          Thanks!

          If you have any relevant information, would you care to elaborate more on why is it considered industry standard and how did it gain such status?

          This is the first time I'm hearing of both the author and the book (which probably says more about me), and it seems kind of odd that the definition differs from what is usually taught in class (at least mine). Of course, it wouldn't be the first time that popular use of some word differs from its original intended meaning, but I'm just interested in the wider historical context.

          Thanks again.

          • chrisseaton 2 years ago

            I don't know what else to do but appeal to authority - it's the definition of a word after all?

            https://cs.illinois.edu/about/people/faculty/padua

            > David Padua has served as program committee member, program chair, or general chair to more than 70 conferences and workshops. He was the Editor-in-Chief of Springer‐Verlag’s Encyclopedia of Parallel Computing and is currently a member of the editorial board of the Communications of the ACM, the Journal of Parallel and Distributed Computing, and the International Journal of Parallel Programming.

  • shwestrick 2 years ago

    Author here. I think it would be very strange to say that this code does not have a race condition. The whole point of the term is to identify circumstances where non-deterministic timing of events influences how you reason about correctness, which is exactly what we're doing here.

    • bheadmaster 2 years ago

      I've personally only heard the term "race condition" used to refer to bugs that have their source in non-deterministic execution of programs. In most cases, they refer to a specific sequence of events that the programmer did not foresaw, which lead to incorrect computation.

      Using the term "race condition" in context of correct programs would make it cover exactly the same universe of programs as the term "non-determinism". I that think the distinction, however trivial (race condition = incorrect behavior, non-determinism = correct behavior), is still useful.

      Great article, by the way. I did not mean to criticize it in any way. My "slight nitpick" about meaning of the words is really just that - a nitpick :)

      • shwestrick 2 years ago

        Ah I see. That's a fair point!

        When talking about this kind of stuff to people who are unfamiliar with, say, lock-freedom, I've found that "non-determinism" is too vague --- people start thinking about things like randomness, or user interaction, etc. In contrast, the term "race condition" seems to hit the nail on the head.

        But certainly, "race condition" also carries with it a bit of baggage ;)

      • throwawaymaths 2 years ago

        I work in the Erlang VM, and we always say stuff like "whether your output to IO hits first ot the logger line outputs to IO first is a race condition". Neither is substantively wrong, and you could definitely say "it's a race!", grab some popcorn, and watch the electrons do their thing.

    • kazinator 2 years ago

      If we put on a cynical hat, your page reads like "I've never heard of lock-free algorithms based on atomic operations. I therefore must have just invented it and I get to name it: how about beneficial use of race conditions?"

      • shwestrick 2 years ago

        Whoa, uhh, I mean, that's an extremely unfair and inaccurate characterization.

        • kazinator 2 years ago

          Sure, and one that one can avoid bringing on oneself by not abusing/twisting/redefining common terminology.

          • bheadmaster 2 years ago

            Eh, I personally wouldn't be so hard on someone because of language. Words don't have definite meanings on their own, it's our general agreement that makes them meaningful. And the meaning of words are often overlapping, shades-of-grey kind of deal, with even more subtle differences in individual understanding of them. Language is hard.

            Moreover, polite explanations may bring enlightenment, but aggressive scoldings are almost guaranteed stop people from accepting your words, even if they're true, because in order to accept their truth, they have to accept the unpleasant implications of your words.

            I think learning should generally be as pleasant as possible. Putting the work in is hard enough as it is.

            • kazinator 2 years ago

              > I think learning should generally be as pleasant as possible.

              If you want pleasant learning on the Internet, don't "learn by teaching" through blogs that mislead other learners, even if only in terminology use.

              People will tear that apart.

              • bheadmaster 2 years ago

                According to the Encyclopedia of Parallel Computing (mentioned in one of the sibling sub-comments of my original comment), you and I are both wrong in what we consider a "race condition". Of course, whether or not the source is an authority on the subject when the popular usage of the word refers to a different concept is questionable, but there's at least a possibility that we're both wrong.

                Regardless, that's still one more reason not to be unpleasant to people when correcting them - you might just find yourself in the wrong.

  • worewood 2 years ago

    Why not use the right term, then? "Coloquial" shouldn't be a thing with technical terms.

    Race conditions are hard enough to explain to people and misusing the term just makes it more difficult.

  • mizzao 2 years ago

    Interesting, it's almost like one could call this a "randomized algorithm" (which we know can be faster than deterministic algorithms) but the non-determinism comes from the input data and code execution rather than a RNG.

  • squeaky-clean 2 years ago

    > This article needs additional citations for verification. (July 2010).

    The definition you quote has no linked citation on Wikipedia. Usually a good sign that you should not treat those statements as definitive. A good Wikipedia article should not state any "facts" without a direct means of verification. Otherwise it's considered "original research" and against the wiki policy for a high quality article.

    https://en.m.wikipedia.org/wiki/Wikipedia:No_original_resear...

    • bheadmaster 2 years ago

      I used Wikipedia in order to have some kind of reference, but I was fairly sure of the meaning beforehand.

      Searching the internet for "race condition definition" and taking the top few results brings several definitions that all agree in spirit with the Wikipedia one (see below).

      If you know of any more reliable source that doesn't agree with Wikipedia on the definition of "race condition", please post it here. This is a honest request - I am always grateful to those who correct my mistakes (in good faith).

          wordnik [0]:  A flaw in a system or process whereby the output or result is unexpectedly and critically dependent on the sequence or timing of other events.
      
          techtarget [1]: A race condition is an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time, but because of the nature of the device or system, the operations must be done in the proper sequence to be done correctly.
      
          techterms [2]: A race condition occurs when a software program depends on the timing of one or more processes to function correctly.
      
          javatpoint [3]: When the output of the system or program depends on the sequence or timing of other uncontrolled events, this condition is called Race Condition.
      
          technopedia [4]: A race condition is a behavior which occurs in software applications or electronic systems, such as logic systems, where the output is dependent on the timing or sequence of other uncontrollable events.
      
      
      [0] https://www.wordnik.com/words/race%20condition

      [1] https://www.techtarget.com/searchstorage/definition/race-con...

      [2] https://techterms.com/definition/race_condition

      [3] https://www.javatpoint.com/what-is-race-condition

      [4] https://www.techopedia.com/definition/10313/race-condition

      • squeaky-clean 2 years ago

        > If you know of any more reliable source that doesn't agree with Wikipedia on the definition of "race condition", please post it here.

        Other commenters have already covered that. The links you shared aren't good primary sources, and probably took their definition from Wikipedia itself, creating a circular reference issue.

        • bheadmaster 2 years ago

          Regardless, if you yourself happen to know any sources that weren't covered in other comments, please share them.

          I remember learning about race conditions in college, and they were always mentioned in the context of bugs - that's why I took Wikipedia itself for granted, as its definition fit my current understanding of the word.

          It seems to be a case of popular usage of the word differing from the original. It also raises a question of whether or not the original intended meaning is the authority, if the majority of programmers use it in a different way. Who should, in general, be the authority on the meaning of a word? But that's more of a philosophical question, I suppose.

layer8 2 years ago

> I’m not talking about data races. Data races are typically bugs, by definition.

One notable exception is the Racy Single-Check Idiom: http://javaagile.blogspot.com/2013/05/the-racy-single-check-...

It is particularly suitable for lazy initialization in code that is typically (but not necessarily) executed single-threaded, and is famously used in Java’s String.hashCode() implementation.

  • shwestrick 2 years ago

    That's a nice example. It seems that data races in Java don't "catch fire"; is that correct? The catch-fire problem is pretty bad for languages like C/C++, which have undefined behavior for data races, and in this sense data races are "bugs by definition" in those languages.

    • kaba0 2 years ago

      Java’s primitives and references are guaranteed to be “tear-free”, which guarantees no “out-of-thin-air” values. So a field set to 1 and being written by several threads to 2 and 3 can only ever be observed as 1,2 or 3, no other value. Is that what you mean under not catching fire?

      • shwestrick 2 years ago

        Perfect. Yes, that's exactly right -- if the language semantics is able to guarantee a set of possible values for a data-racy read, then it doesn't catch fire.

        The catch-fire terminology comes from the analogy that, as soon as a data race occurs, the semantics of the program completely explodes, and all guarantees are lost---the program is then allowed to do literally anything. This is sometimes known as "DRF-SC or catch fire": either the program is data-race-free (and therefore its executions are sequentially consistent), or the program has undefined behavior.

        Infamously, the C memory model has the catch-fire problem. And therefore, any language which relies on the C memory model can catch-fire. As of today, I believe this includes C/C++, Swift, Rust, and probably a few others.

      • moonchild 2 years ago

        > Java’s primitives and references are guaranteed to be “tear-free”, which guarantees no “out-of-thin-air” values

        Tear-free does not imply no out-of-thin-air. But, afaik, the java memory model protects from both tearing and oota.

        • kaba0 2 years ago

          Yeah, you are right, I was not clear in my wording.

      • oconnor663 2 years ago

        I think catching fire here refers to the general "this is UB and all bets are off" situation. For example, if you're data racing against an int on x86-64 Linux, you might reason that the `mov` instruction on that platform can't possibly product torn reads. And you'd be right as far as that goes. But the C/C++/Rust compiler still considers those data races UB and may still do horrible things like just deleting big chunks of your code if it manages to prove that that's what you've done.

heydenberk 2 years ago

I appreciate a provocative title, but I think the practical lesson in almost all cases is the inverse: fixing race conditions can introduce performance bottlenecks.

  • shwestrick 2 years ago

    Author here. It all depends on what the goal is. If performance is the goal, then perhaps race conditions can be considered acceptable, if the gains are significant enough.

    I would hope that the primary takeaway from this post is that race conditions are not necessarily bugs. Race conditions are not necessarily something that need to be "fixed".

touisteur 2 years ago

Tell me about non guaranteed order of operations in GPU reductions and floating point results changing slightly between two runs. Yes it's useful and you get the goddamn FP32 TFLOPS, but damn it makes testing, validating, qualifying systems harder. And yes, I know one shouldn't rely and test on equality, but not knowing the actual order of FP operations makes numerical analysis of the actual error harder (just take the worst case of every reduction, ugh).

EDIT: and don't get me started on tensor cores and clever tricks to have them do 'fp32-alike' accuracy. Yes, wonderful magic but how do you reason about these new objects without a whole new slew of tools.

  • mgaunard 2 years ago

    There is nothing wrong with testing for equality so long as your computation is exact or correctly-rounded.

    • nextaccountic 2 years ago

      There is if the algorithm contains race conditions that cause non-deterministic output. The submitted article goes above and beyond to guarantee that the code always output the same answer even though it has race conditions. But that's sometimes not possible or, if it's possible, it's too much of a hassle so it's rarely done.

      For example, this project https://github.com/EmbarkStudios/texture-synthesis generates textures and if you run the same code with the same input various times, the results will be slightly different. Here https://github.com/EmbarkStudios/texture-synthesis#notes it says: "When using multiple threads for generation, the output image is not guaranteed to be deterministic with the same inputs. To have 100% determinism, you must use a thread count of one"

      • mgaunard 2 years ago

        I gave conditions upon which testing for equality is correct.

        Of course if the result is non-deterministic it doesn't satisfy those conditions.

        • nextaccountic 2 years ago

          Unfortunately it may be quite hard to certify that your program qualifies, specially if it's a high performance program that can't be single threaded. An innocuous-looking commit can completely undermine this property.

          Doubly so if you must guarantee determinism across multiple platforms! IEEE 754-2008 helps but it defines cross-platform determinism for just a subset of operations. Compilers can also sometimes botch your FP code (there's a number of gotchas - for example, if any library anywhere in your program uses -ffast-math, it may infect the whole program https://stackoverflow.com/questions/68938175/what-happens-if...)

          • mgaunard 2 years ago

            Achieving exact or correctly-rounded results is much more work than that.

            You can look at the crlibm papers for example.

    • touisteur 2 years ago

      Yes, I know. For purely sequential code that's the actual use (though sometimes, golden tests are generated through matlab or python, in double precision and then every divergence becomes a game of whack a mole. And don't start me on x87-80 bits extended precision suddenly compiled to SSE, so actual ieee754... We have integrated some of the FP static and dynamic analysis tools in our CI/CD pipeline for new code but ugh...

      Anyway, as time passes by I veer off equality and think about the actual necessary accuracy and wish there was a way to set it as a spec for proof (SPARK/Ada or a higher level DSL that can be lowered to proper accuracy analysis tools...

      I wish I could also specify 'no NaNs please' as a postcondition. Need to check in with the SPARK team and get an introduction article going...

      • mgaunard 2 years ago

        There are simple tools that tell you how many of your floating-point digits are just propagated rounding errors.

    • chrisseaton 2 years ago

      I think some people have gotten the mistaken idea that floating point arithmetic is inherently somehow non-deterministic. It is of course entirely deterministic, and if you do the same FP operations in the same order you will get the same result.

      • touisteur 2 years ago

        I was thinking and talking specifically of GPUs and reductions (aka multiple core interacting) during which you don't always know the exact order of operations.

        And also, tell that to the people that went from a compiler using x87 instructions to one using SSE instructions and between two binaries from the same code get different results. Yes, the exact same suite of FP instructions should always give the same results. And that's also supposing you're not loading some library that sets ugly fast-maths flags (see the recent yak-shaving session by @moyix).

      • nextaccountic 2 years ago

        You get the same result if you run on the same architecture and with the same instructions (or if you run on machines that implement IEEE 754-2008 [*], which was the first standard that guaranteed cross-platform determinism for a subset of floating point operations, which means, no SIMD!! =/), and you don't have non-determinism introduced by thread interleaving and race conditions (unless you very carefully account for that, like the article submitted in this thread)

        I wish we had a language that guaranteed that the results of a computation were deterministic, all the while it properly enabled the use of all available hardware resources (so: using SIMD, all CPU cores, and also offloading some code to the a GPU if available), even if it had some overhead. Doing this manually is ridiculously difficult if you want to write high performance software, specially if you use GPUs.

        [*] See https://stackoverflow.com/questions/42181795/is-ieee-754-200... - the amazing Rapier physics engine https://rapier.rs/ leverages IEEE 754-2008 to have a cross-platform deterministic mode that will run physics exactly the same way in every supported platform https://rapier.rs/docs/user_guides/rust/determinism/ - but this means taking a huge performance hit: you can't use SIMD and you must run the physics on a single thread.

        • mgaunard 2 years ago

          Typical technobabble from someone who doesn't really understand floating-point.

          Floating-point is not associative. Reordering operations yields different results, so no compiler will do so, unless you specifically disable standards conformance.

          The use of SIMD, which is just a type of instruction-level parallelism, has no effect on the result of floating-point operations, unless of course you reorder your operations so that they may be parallelized.

          What does affect the result of floating-point operations is when rounding happens and at what precision. If we're talking about C, the compiler is allowed to run intermediate operations with higher precision than that mandated by its type. This is merely so that it can use x87 which is 96-bit long by default and only round when it spills to memory and needs to store a 64-bit or 32-bit value. Compilers have flags to disable that behaviour, and it doesn't apply when the SSE unit instead of x87 is used. Using SSE for floating-point doesn't necessarily mean it's using SIMD, most of the instructions have scalar variants.

          Another example is FMA, which might be substituted for any multiply+add operations.

          In practice if your code breaks with this it just means it was incorrect in the first place.

          • raphlinus 2 years ago

            The actual rules are very complicated. C allows greater precision for intermediate results but compilers are sometimes careful to stick to IEEE rounding. [1] contains a good general overview, and [2] talks about FMA in particular. And in [3] I've set up a Godbolt example to play with. By default -O3 gives you FMA, but -O or -O3 with -ffp-contract=off don't. So you absolutely can get different results depending on optimization levels.

            [1]: https://randomascii.wordpress.com/2012/03/21/intermediate-fl...

            [2]: https://kristerw.github.io/2021/11/09/fp-contract/

            [3]: https://godbolt.org/z/eTz8o6b3P

            • mgaunard 2 years ago

              The rule is very simple, I'm not seeing anything in what you say suggesting that it isn't?

              • raphlinus 2 years ago

                Perhaps the rule in the standard is simple - the compiler can arbitrarily round to finer precision than IEEE, but in practice it's complicated as the same code can behave quite differently depending on what chip it's compiled for, the level of optimizations, and other factors. If you want to control it, ie model it as something other than nondeterminism, figuring out the right combination of compiler flags and so on is tricky.

                I'll also point out that fma is relatively new, so it's pretty easy to write code that works fine when compiled with default x86_64/SSE2 but will break when compiled for a more recent target cpu.

                • mgaunard 2 years ago

                  All you're doing is listing simple consequences of the rule.

                  The compiler may use increased precision for intermediate computations. That means sometimes it will, sometimes it won't. If you understand the basics of the situations where it will do so, you can see it depends on register allocation, which of course not only depends on optimization level, but also can change anytime you change anything at all in the source code.

          • kaba0 2 years ago

            You do realize programming languages don’t necessarily manipulate the architectures native floating point operations, but are free to define any semantics they want? You know, like it could have number types that work like in math, e.g. symbolic math tools does exactly that.

            Also, that kind of language is absolutely not warranted.

hegelstoleit 2 years ago

If you're just doing BFS why do you care who the parent is? Why not just choose the parent to be the predecessor? I.e if you visit 4 from 1, then 1 is the parent. Why do you need to check a list of potential parents?

  • shwestrick 2 years ago

    When visiting vertices in parallel, there might be multiple potential parents that all attempt to visit the same vertex simultaneously. So, we need a way of picking which parent "wins".