pjc50 2 hours ago

The fun bit is right at the start when the author notices that the compiler spots this and optimizes it away.

We didn't get into the deeper question of benchmarking it vs. a three-register swap, because I suspect the latter would be handled entirely by register renaming and end up being faster due to not requiring allocation of an ALU unit. Difficult to benchmark that because in order for it to make a difference, you'd need to surround it with other arithmetic instructions.

A meta question is why this persists. It has the right qualities for a "party trick": slightly esoteric piece of knowledge, not actually that hard to understand when you do know about it, but unintuitive enough that most people don't spontaneously reinvent it.

See also: https://en.wikipedia.org/wiki/Fast_inverse_square_root , which requires a bit more maths.

The other classic use of XOR - cursor overdrawing - has also long since gone away. It used to be possible to easily draw a cursor on a monochrome display by XORing it in, then to move it you simply XOR it again, restoring the original image.

danhau 2 hours ago

> Are there other XOR tricks?

Yes, error correction.

You have some packets of data a, b, c. Add one additional packet z that is computed as z = a ^ b ^ c. Now whenever one of a, b or c gets corrupted or lost, it can be reconstructed by computing the XOR of all the others.

So if b is lost: b = a ^ c ^ z. This works for any packet, but only one. If multiple are lost, this will fail.

There are way better error correction algorithms, but I like the simplicity of this one.

  • nsteel 1 hour ago

    Also to cheaply (area) create multi-port RAMs.

    • pjc50 1 hour ago

      How does that work?

      • nsteel 5 minutes ago

        It's similar to RAID schemes but instead of drive failure it's port unavailability. There's a reference at [1] or an FPGA-centric one at [2], but it applies to anywhere where dual/single-port rams are readily available but anything more exotic isn't.

          [1] Achieving Multi-Port Memory Performance on Single-Port Memory with Coding Techniques - https://arxiv.org/abs/2001.09599
          [2] https://people.csail.mit.edu/ml/pubs/fpga12_xor.pdf
  • Terr_ 1 hour ago

    See also: RAID levels that use one disk for parity. Three disks is simplest, but technically you can do more if you trust that only one will go bad at a time.

    A few months ago, I had a rare occasion of trying to explain them to a relative who had just bought a fancy NAS and wanted help setting it up.

  • amelius 1 hour ago

    XOR is also great for storing copyrighted works without liability.

    a = the bits of some song or movie

    b = pure noise

    Store c = a^b.

    Give b to a friend. Throw away a.

    Now both you and your friend have a bit vector of pure noise. Together you can produce the copyrighted work. But nobody is liable.

    • lelanthran 57 minutes ago

      That's called encryption using a one time pad.

      • amelius 39 minutes ago

        That's one way of looking at it. Note that you can easily extend it to multiple people holding multiple keys, so that a=b^c^d^e, etc.

        • lelanthran 8 minutes ago

          My point was that encrypting a copyrighted work does not remove the copyright from it; the result is still copyright and not legally redistributable.

          • amelius 3 minutes ago

            Maybe, but b and c are indistinguishable from pure noise.

    • pjc50 52 minutes ago

      Whether people are liable is a question for the courts, and I suspect they simply look through the tech and ask "do you end up with a copy of the work?"

      (unless you're an AI company, in which case you can copy the whole internet just fine)

      • amelius 20 minutes ago

        But that's an unsolvable question. Just like when A is caught on camera stealing a diamond, but A turns out to have an identical twin B. So the prosecutor can't do anything.

    • meindnoch 3 minutes ago

      Pretty sure the law doesn't care about this XOR trick.

gobdovan 3 hours ago

The XOR trick is only cool in its undefined-behavior form:

a^=b^=a^=b;

Which allegedly saves you 0.5 seconds of typing in competitive programming competitions from 20 years ago and is known to work reliably (on MinGW under Windows XP).

Bonus authenticity: use `a^=a` to zero a register in a single x86 instruction (and makes a real difference for compiler toolchains 30+ years old).

For real now, a very useful application of XOR is its relation to the Nim game [0], which comes in very handy if you need to save your village from an ancient disgruntled Chinese emperor.

[0] https://en.wikipedia.org/wiki/Nim

  • MathMonkeyMan 2 hours ago

    `xor eax, eax` is less code when assembled. That's why compilers generate it.

    • gobdovan 2 hours ago

      ..and reliably generate it for decades, but you can still manually do it in C for bonus stylistic points!

  • ginko 2 hours ago

    >Bonus authenticity: use `a^=a` to zero a register in a single x86 instruction (and makes a real difference for compiler toolchains 30+ years old).

    Modern compilers will still use xor for zeroing registers on their own.

    For instance: https://godbolt.org/z/n5n35xjqx

    Variable a(register esi) is first initialized to 42 with mov and then cleared to zero using xor.

    • gobdovan 2 hours ago

      Yes, the whole comment is tongue in cheek for pointless manual optimizations.

  • leni536 1 hour ago

    > The XOR trick is only cool in its undefined-behavior form:

    > a^=b^=a^=b;

    I believe this is defined in C++ since C++17, but still undefined in C.

mmozeiko 4 hours ago

xor swap trick was useful in older simd (sse1/sse2) when based on some condition you want to swap values or not:

  tmp = (a ^ b) & mask
  a ^= tmp
  b ^= tmp

If mask = 0xfff...fff then a/b will be swapped, otherwise if mask = 0 then they'll remain the same.

  • CJefferson 4 hours ago

    Oh, that is cool, I’ve never seen that. I might add that to an extended version of the post sometime, I’ll be sure to credit you.

  • jesse__ 50 minutes ago

    That's hella cute

  • koolala 9 minutes ago

    Could this still be the ideal way for vectors of Ints in WebGL2?

fuglede_ 2 hours ago

The XOR swap trick also features in the compilation/synthesis of quantum algorithms, where the XOR instruction (in the form of a CNOT gate) is fundamental in many architectures, and where native swapping need not be available.

One extension that I ran into, and which I think forms a nice problem is the following:

Just like the XOR swap trick can be used to swap to variables (and let's just say that they're bools), it can be extended to implement any permutation of the variables: suppose that the permutation is written as a composition of n transpositions (i.e., swaps of pairs), and that is the minimal number of transpositions that let's you do that. Each transposition can be implemented by 3 XORs, by the XOR swap trick for pairs, and so the full permutation can be implemented by 3n XORs. Now here's the question: Is it possible to come up with a way of doing it with less than 3n, or can we find a permutation that has a shortcut through XOR-land (not allowing any other kinds of instructions)? In other words, is XOR-swapping XOR-optimal?

I'm not going to spoil it, but only last year a paper was published in the quantum information literature that contains an answer [0]. I ended up making a little game where you get to play around with XOR-optimizing not only permutations, but general linear reversible circuits. [1]

[0] https://link.springer.com/article/10.1007/s11128-025-04831-5

[1] https://swapple.fuglede.dk

praptak 3 hours ago

Xor swap trick has perfect profile for underhanded C contests. It generally works until a specific condition triggers its failure. The condition is "the arguments are aliases", so for example XOR_SWAP(a[i], a[j]) when i=j.

  • imploded_sub 51 minutes ago

    TFA mentions this as an issue for the trick, whereas you rightly point out it's an evil feature instead.

YasuoTanaka 23 minutes ago

Used to do this in assembly back when registers actually mattered. Today it mostly hurts readability more than anything.

torginus 1 hour ago

I would remark that modern CPUs don't use physical registers, so swapping should be just a register rename op, and this kind of bithacking only applies to old machines.

The article makes the same point as well at the end:

It is the kind of technique which might have been occasionally useful in the 1980s, but now is only useful for cute interview questions and as a curiosity.

Panzerschrek 2 hours ago

Such tricks were maybe useful 40 years ago while writing assembly code manually or while using a dumb compiler with no optimizations. But nowadays such tricks are near to useless. All useful ones (like optimizing division by 2 via bitshift ) are already implemented as compiler optimizations. Others shouldn't be used in order to avoid making optimizer's job harder.

  • Tade0 2 hours ago

    Back when I was in college in the late 00s we were advised to not attempt to optimise using assembly unless we really found a bottleneck the compiler missed.

    • pjc50 2 hours ago

      Correct. There are still some situations that benefit from it, but only using the extended instruction sets that compilers can't/won't generate. Even then you should at least try writing the C code and seeing if it will auto-vectorize. Even C# can auto-vectorize some cases now.

      Things like "add with saturation" and the special AES instructions.

      • jesse__ 46 minutes ago

        In my experience, relying on the compiler to auto vectorize your code is a path fraught with peril.

        It'll break eventually. If it matters, write the simd yourself. It'll probably be 2-50x better than the compiler anyways.

DeathArrow 2 hours ago

30 years ago, when I wanted to 0 a register in assembly I used something like xor ah, ah because it was a bit more performant.

  • icelusxl 2 hours ago

    It still is. The CPU's register renamer can detect these instructions to not have data dependencies and can zero the register itself. It doesn't send the instruction to the execution engine meaning they use no execution resources and have zero latency.

  • pjc50 2 hours ago

    IIRC that's because that instruction is one byte while a "load immediate" would have to express 0 as one or more bytes.

    (See also the wacky way in which ARM "load immediate" works)

stinkbeetle 3 hours ago

You can use this same property of xor to make a double-linked list using just one pointer per item, which is xor of the previous and next item addresses!

  • Panzerschrek 2 hours ago

    Except such "optimization" may affect performance on CPUs with speculative execution/prediction. But it should be measured.

ranger_danger 5 hours ago

> given a list where every value appears exactly twice except one, XOR all the values together and the duplicates cancel out, leaving the unique element

For some reason this reminds me of the Fourier transform. I wonder if it can be performed with XOR tricks and no complicated arithmetic?

  • fluoridation 4 hours ago

    There's a more general formulation, which is that every value but one must appear even numbers of times, and the one must appear some odd number of times.

    • twiceaday 3 hours ago

      How about every value appears 0 mod n times except one which appears 1 mod n times? :)

      Solution: xor is just addition mod 2. Write the numbers in base n and do digit-wise addition mod n (ie without carry). Very intuitive way to see the xor trick.