A DSL for peephole transformation rules of integer operations in the PyPy JIT

108 points by todsacerdoti 9 months ago

fweimer 9 months ago

GCC does this match.pd. Here's the add_zero example, I think:

    /* Simplifications of operations with one constant operand and
       simplifications to constants or single values.  */
    
    (for op (plus pointer_plus minus bit_ior bit_xor)
      (simplify
        (op @0 integer_zerop)
        (non_lvalue @0)))

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/match.pd

The floating point case can be more complicated because of signed zeros and signaling NaNs, see fold_real_zero_addition_p.

westurner 9 months ago

math.fma fused multiply add is in Python 3.13. Are there already rules to transform expressions to math.fma?

And does Z3 verification indicate differences in output due to minimizing float-rounding error?

https://docs.python.org/3/library/math.html#math.fma :

> math.fma(x, y, z)

> Fused multiply-add operation. Return (x * y) + z, computed as though with infinite precision and range followed by a single round to the float format. This operation often provides better accuracy than the direct expression (x * y) + z.

> This function follows the specification of the fusedMultiplyAdd operation described in the IEEE 754 standard.

cfbolztereick 9 months ago

I only support integer operations in the DSL so far.
(but yes, turning the python expression x*y+z into an fma call would not be a legal optimization for the jit anyway. And Z3 would rightfully complain. The results must be bitwise identical before and after optimization)
- westurner 9 months ago
  
  Does Z3 distinguish between x*y+z and z+y*x, in terms of floating point output?
  math.fma() would be a different function call with the same or similar output.
  (deal-solver is a tool for verifying formal implementations in Python with Z3.)

hoten 9 months ago

The before/after delta is very satisfying and clearly easier to work with. Very nice.

How does the DSL handle side effects? I'm assuming any term here could be a compound expression, so what part of the system prevents "0*callfn()" from reducing to "0"?

bastawhiz 9 months ago

I believe this operates on values in something akin to registers, not expression trees. By the time you get to a place where one of these rules is relevant, the side effect has already taken place. This just prevents the return of such a function call to be operated on, like in your example.
- cfbolztereick 9 months ago
  
  Yeah, that's exactly right. It's operating on an SSA-based intermediate representation. So if an operation gets removed, the arguments are still being computed. The earlier operations that produce those arguments can also be removed by dead code elimination (which runs later), but indeed only if they have no side effects.
  I suppose the notation of the rule heads makes it look like there actually are expression trees, which is maybe sort of confusing.

JonChesterfield 9 months ago

A cost model would give a proof of termination - require the output of each transform to be cheaper than the input. Confluence is less obvious.

mattalex 9 months ago

Once you have strong normalization you can just check local confluence and use Newman's lemma to get strong confluence. That should be pretty easy: just build all n^2 pairs and run them to termination (which you have proven before). If those pairs are confluent, so is the full rewriting scheme.
- JonChesterfield 9 months ago
  
  That is a new one to me. Tracked the reference back to https://www.jstor.org/stable/1968867 which looks excellent. Thank you!

mjcohen 9 months ago

Interesting to see which optimizations are never used.

marticer1 9 months ago

[dead]