Ask HN: How is it possible to get -0.0 in a sum?

10 points by gus_massa 2 days ago

I'm looking for corner cases where he result is -0.0. As far as I know, the only way to get -0.0 in a sum is

  (-0.0) + (-0.0)
Does someone know any other case in IEEE 754?

Bonus question: What happens in subtractions? I only know

  (-0.0) - (+0.0)
Is there any other case?
sparkie 2 days ago

It depends on the FP rounding mode. If rounding mode is FE_TOWARDZERO/FE_UPWARD/FE_TONEAREST then the case you gave is the only one I'm aware of. If rounding mode is FE_DOWNWARD (towards negative infinity) then other calculations that result in a zero will give a -0.0.

Here's an example of -1.0f + 1.0f resulting in -0.0: https://godbolt.org/z/5qvqsdh9P

  • gus_massa 13 hours ago

    Thanks! [Sorry for the delay.]

    ---

    FYI: For more context, I'm trying to send a PR to Chez Scheme (and indirectly to Racket) https://github.com/cisco/ChezScheme/pull/959 to reduce expressions like

      (+ 1.0 (length L))  ;  ==>  (+ 1.0 (fixnum->flonum (length L)))
    
    where the "fixnums" are small integers and "flonums" are double.

    It's fine, unless you have the case

      (+ -0.0 (length L))  ;  =wrong=>  (+ -0.0 (fixnum->flonum (length L)))
    
    because if the length is 0, it get's transformed into 0.0 instead of -0.0

    There are a few corner cases, in particular because it's possible to have

       (+ 1.0 x (length L))
    
    and I really want to avoid the runtime check of (length L) == 0 if possible.

    So I took a look, asked there, and now your opinion confirms what I got so far. My C is not very good, so it's nice to have a example of how the rounding directions are used. Luckily Chez Scheme only uses the default rounding and it's probably correct to cut a few corners. I'll take a looks for a few days in case there is some surprise.

    • sparkie an hour ago

      I'm not sure you can avoid the check, but you could perhaps avoid a branch with some clever trick - maybe using AVX-512 mask instructions.

      A recent AVX-512 extension has a `vfixupimm` instruction[1] which can adjust special floating point values. You could use this to adjust all zeroes to -0 but leave any non-zeroes untouched. It isn't very obvious how to use though.

          vfixupimmsd dst, src, fixup, flag
      
       * The `flag` is for error reporting - we can set it to zero to ignore errors.
      
       * `dst` and `src` are a floating point value - they can be the same register.
      
       * The instruction first checks `src` and turns any denormals into zero if the MXCSR.DAZ flag is set.
      
       * It then categorizes `src` as one of {QNAN, SNAN, ZERO, ONE, NEG_INF, POS_ING, NEG_VALUE, POS_VALUE}
      
       * `fixup` is an array of 8 nybbles (a 32-bit int) and is looked up based on the categorization of `src` {QNAN = 0 ... POS_VALUE = 7}
      
       * The nybbles denote which value to place into `dst`:
      
          0x0 : dst
          0x1 : src (with denormals as zero if DAZ flag is set)
          0x2 : QNaN(src)
          0x3 : QNAN_Indefinite
          0x4 : -INF
          0x5 : +INF
          0x6 : src < 0 ? -INF : +INF
          0x7 : -0
          0x8 : +0
          0x9 : -1
          0xA : +1
          0xB : 1/2
          0xC : 90.0
          0xD : PI/2
          0xE : MAX_FLOAT
          0xF : -MAX_FLOAT
      
      So to turn all zeroes into negative zeroes, you want to set the nybble for categorization ZERO (bits 11..8) to 0x7 (-0) in `fixup`. This would mean you want `fixup` to be equal to `0x00000700`. So usage would be:

          double fixnum_to_flonum(int64_t fixnum) {
              __m128d result = { (double)fixnum };
              return _mm_cvtsd_f64(_mm_fixupimm_sd(result, result, (__m128i){ 0x700 }, 0));
          }
      
      Or in assembly (SYSV calling convention)

          fixnum_to_flonum:
              vxorpd          xmm0, xmm0
              mov             edx, 0x700
              vcvtsi2sd       xmm0, xmm0, rdi
              vmovq           xmm1, rdx
              vfixupimmsd     xmm0, xmm0, xmm1, 0
              ret
      
      This function would turn an int64_t into a double and any zeroes into -0.0, with no branches. It can be extended to operate on 8 doubles at a time (__m512d).

      Unfortunately I'm unable to test properly as Compiler Explorer is giving SIGILL - presumably because it's running on some instance which doesn't support this instruction. My own machine is also too old.

      Given how recent this instruction is, it's probably not viable for your use-case, but maybe something to consider in future when it's more widely available.

      For now just stick with a branch version or figure out some other way to make it branchless - though I can't think of one which would be any faster than a branch.

      [1]:https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

gethly a day ago

i would guess that because of how *** * floats are in binary computers, you have something like -0.0000000000000000000000000000000000001 and when you round it you end up with -0.0. Same goes for positive value, you're just not used to write the + sign before every number, so seeing the minus feels strange.

  • dcminter 14 hours ago

    You're answering a question that OP did not ask.

kazinator 2 days ago

What happens if we take the smallest (as in closest to zero) negative subnormal and add it to itself?