TheRealPomax 15 days ago

This is pretty damn cool, but unfortunately you failed the interview as you accepted the challenge to use x86 assembler, but solved the problem using a different programming language from the one we asked you to use. We'll keep your resume on file, and if there are any openings in the future we encourage you to apply for those.

  • penguin_booze 15 days ago

    The feedback we received indicates that you used the backspace key twice during coding. We expect higher precision than that.

    Also, we'd like to remind you that you can only re-apply after our cool-off period, which is 25 years.

  • teknas 15 days ago

    (I am the post author). I wrote the original version of this post using x86, but leetcode won't accept it. So I had to improvise...

cgh 15 days ago

> I will be using x64 assembly with the AT&T syntax as it is objectively superior than the Intel syntax.

This made me laugh because it must be a reference to this:

> I contend that the AT&T syntax is harmful and bad, and should never be used, for any reason, under any circumstances, by anyone.

  • badrabbit 15 days ago

    AT&T really is annoying but it feels like the big vs little endian debate. Fairly easy to convert between the two as well.

    • sedatk 15 days ago

      For those who don't know, that's why big and little endian were called that, because the debate was so frivolous. It's a reference to the book Gulliver's Travels by Jonathan Swift in which an island folk was split about from which end you should crack a boiled egg. (I'm a big endian for example).

      • dekhn 15 days ago

        Wait, hasn't everybody learned about inside-out endian? The highest order bit is in the middle, then proceeds along a hilbert path to the edges

        • sedatk 15 days ago

          No, but that's how I peel an egg.

      • userbinator 15 days ago

        Try to find (or even better, write) a big-endian arbitrary-precision arithmetic implementation if you're convinced that the difference is frivolous. You'll easily see why one is sometimes called "logical endian" and the other "backwards endian".

        • comex 15 days ago

          Or try to read numbers in a hex editor, and quickly obtain the opposite conclusion. :)

      • quelsolaar 15 days ago

        I used to be big endian. Then i changed my mind. Have a look at the following:

        -123 -12 -1

        Here are three numbers. If we read them from left to right, they all begin with 1, but we cant tell what the 1 means, in one case it means 100, in one it means 10 and in one it means 1. We need to parse the entire number to know what the first number means. If we parse from right to left, each number always means the same thing. the first is how many once, the second how many 10s and so on.

        So it makes sense to store the smallest first. In a big endian architecture, if i want to read an 8, 16, 32 or 64 bit word each byte will always mean the same, if we pad out the type with zeros. So little endian is right, and Arabic numerals are wrong.

        • rags2riches 15 days ago

          Arabic numerals are little endian when writing right-to-left. Like Arabic is written.

          • mananaysiempre 15 days ago

            Indic scripts are LTR, and so stayed the big-endian numerals when they were borrowed by medieval Islamic mathematicians. Numerals in modern spoken Arabic are big endian (mostly, with an exception for the teens that also exists in English).

        • quietbritishjim 15 days ago

          It looks like the comment you're replying to was talking about eggs. I'm little endian by the way.

      • nmilo 15 days ago

        Genuinely curious, what are any advantages of big endian other than "it's how we write numbers in base 10?"

        • patrec 15 days ago

          Sorting (assuming your numbers are unsigned and same width, you can sort them as byte strings).

          Dispatch (if only the topmost n-bits (or bytes) are needed to make a decision, that's all you need to read).

        • chx 15 days ago

          Signs and strings

      • classified 15 days ago

        I'm firmly in the little-endian camp for eggs, but big-endian for CPUs.

        • monocasa 15 days ago

          Watch as every pitchfork gets pointed at you when you talk about middle endian.

          Which is a real thing. There are systems that would store, say, a 32-bit word as two 16-bit words that were big endian relative to each other, but little endian within the 16-bit word.

          • account42 15 days ago

            Yes, but which order are the bits stored in each byte?

        • sedatk 15 days ago

          Found my archnemesis.

          • nurettin 15 days ago

            Do you even care about CPUs anymore?

            • classified 15 days ago

              Well, boiled eggs definitely taste better.

    • efficax 15 days ago

      except now that everything depends on the internet, and words that go over networks are big endian, it seems insane to throw away millions and millions of cpu cycles every year converting them to little endian to be processed by our little endian cpus. sure, it's a single cpu instruction, but between every computer in the world, almost all of them being little endian arm or intel, that's billions and billions and billions of instructions wasted.

  • tux3 15 days ago

    >I will be using x64 assembly with the AT&T syntax as it is objectively superior than the Intel syntax.

    Them's fighting words.

    • EarlKing 15 days ago

      I immediately stopped reading the minute I read that in the text. I can't take anything they say seriously after reading that.

danbruc 15 days ago

Assume RAX points to the root node and nodes just contain two child pointers and everything is aligned and whatever.

  cmp rax, 0
  jnz swap:
  push rax
  mov rax, [rax]
  call invert
  mov rbx,[rsp]
  xchg rax, [rbx+8]
  mov [rbx], rax
  call invert
  pop rax
The last time I used an assembler was before x86-64 was invented, I am not even sure I ever used one in protected mode. But that seems a totally reasonable whiteboard interview question. Written in notepad, might not assemble. Might even be totally incorrect and I am posting it so that the internet generates the warning and error messages.

EDIT: After reading the article now, that seems rather inefficient to me, to use local variables on the stack for everything. And why is the function returning a node if it is mutating the tree in place?

  • adrian_b 15 days ago

    The variables on the stack are the most efficient after registers, so you are right that a local variable should be kept into a register if possible, otherwise in the stack, and only then in other places (e.g. if it is too large for the stack).

    However, when writing in assembly one must pay attention that at least RBX, RBP and R12 through R15 must be preserved by any function (on Windows also RDI and RSI must be preserved).

    So in your code you should not use RBX, but a volatile register, e.g. RDX or RCX. If you would insist on using RBX, it would have to be saved and restored.

    • userbinator 15 days ago

      However, when writing in assembly one must pay attention that at least RBX, RBP and R12 through R15 must be preserved by any function

      Only if you're calling external code that assumes that. The power of Asm largely comes from not needing to follow arbitrary conventions in your own code. The boundaries where you interface to external code are the only constraints.

    • roperj 15 days ago

      > The variables on the stack are the most efficient after registers

      Why are variables on the stack more efficient than other memory accesses?

      • MobiusHorizons 15 days ago

        The top of the stack should generally be already present in the cache, so stack memory would be faster than heap memory where objects aren’t necessary as close together or accessed as frequently.

      • menaerus 15 days ago

        I am not familiar with any other reason besides the fact that stack-pointer arithmetics are done through a dedicated HW, called stack-engine, which sits in the CPU frontend. This effectively means that stack-pointer uOps are going to be get executed quicker because they do not have to go all the way through the CPU backend processing. This also saves some of the bandwidth on the CPU backend side allowing other uOps to be processed sooner.

      • xgkickt 15 days ago

        Register renaming. Stack references are more likely to be in the register file.

    • danbruc 15 days ago

      I know none of the calling conventions in any detail anymore and just used the registers in alphabetical order. Totally expected that this would violate something.

  • pyinstallwoes 15 days ago

    Did you write Forth at all?

    • danbruc 15 days ago

      No. Why do you ask?

      EDIT: As Wikipedia describes it as a stack-oriented language, because of my comment about putting everything onto the stack?

ummonk 15 days ago

This doesn't actually invert a Merkle tree though, since you have to recompute the hashes (except the leaf hashes) when you invert a Merkle tree. Gonna be a no-hire evaluation from me dawg.

  • hinkley 15 days ago

    The tweet this is based on is a joke. To invert a Merkle tree would mean to invert cause and effect. I’m pretty sure the tweet author is implying they want you to find a hash key collision for each node. Hope you have a couple spare universes in your pockets because this is gonna take a while.

love2read 15 days ago

I love seeing people solve leetcode challenges in asm, are there any more blogposts like this?

  • chrisseaton 15 days ago

    The hard bit of solving them is usually the algorithm though - when you know that you can code it in anything.

    • kjeetgill 15 days ago

      I'm not well versed in assembly, so learning assembly first would be the hard part!

      I'm with GP, it's fun seeing how solutions differ between languages as a way to peek into other language communities I don't spend as much time in.

wheelerof4te 15 days ago

Oooh. So that's why people invented C.

aftbit 15 days ago

What does it mean to "invert" a binary tree?

  • dekhn 15 days ago

    Inverting a binary tree is easy; express the tree as a matrix (laplacian), invert the matrix, then convert that back to a tree. What the canonical question is asking is not inversion.

    Since too many people were memorizing inversion, I switched to asking how to evert a binary tree. This leads naturally into a discussion of the 1:1 relationship between complex numbers and cohomology sets, I figure if somebody can get that right, they can be a junior programmer on the team.

  • tom_ 15 days ago

    "Flip" or "mirror" is probably a better term. It seems the goal is to swap left and right:

    • LanceH 15 days ago

      I don't get why this one is the meme. Just because it's recursion? Because it's (nearly) pointless? There are so many other algorithms I find more difficult/more tedious.

      • plasticchris 15 days ago

        I think it’s a meme because the home brew author tweeted that he was rejected by Google for failing to invert a binary tree.

    • kristov 15 days ago

      Is there a practical reason to do this in a real-world program?

      • saagarjha 15 days ago

        Sure, it’s the binary tree equivalent of reversing an array.

        • Tainnor 14 days ago

          For large enough trees it's probably more efficient to instead just switch from preorder to postorder traversal instead of changing the whole tree.

devit 15 days ago

That solution is terrible, with a bad algorithm that requires O(tree_height) space (the optimal one involves temporarily using left/right pointers as a parent pointer so that you only need constant space) and lacking any sort of assembly optimization, being worse than what a compiler would produce (e.g. it's a real mystery how the author managed to decide that local_right should be spilled on the stack).

Definitely not what you want to submit to someone testing your programming skills.

  • Kranar 15 days ago

    You're alluding to using the Morris traversal algorithm which can traverse a binary tree in O(1) space, but Morris traversal is actually much much slower than using a stack, especially as is used by this algorithm. Doing a Morris traversal requires at a minimum twice the number of operations as using a stack, and due to its cache unfriendly nature will in practice be closer to 4x slower.

    You typically only use Morris traversal on exceptionally large trees, and by large I mean when working with data that lives on a disk. It's definitely the exception, not the norm.

fear91 15 days ago

You can use the 32bit xor to reset the register. Also TEST REG,REG might be better for checking if it’s zero.

  • badrabbit 15 days ago

    Is 32 bit cheaper? Still 1 cycle.

    • nixpulvis 15 days ago

      I was thinking of this from the perspective of CPU pipeline pressure, but in reality it seems prosessors are indeed smart enough to avoid burdoning the ALUs execution with these kinds of special cases.

      Read more here

      > [...] these zeroing instructions extremely efficient, with a throughput of four zeroing instructons per clock cycle.

      Also, the xor instruction takes up the smallest amount of .text space (right?).

      • scheme271 15 days ago

        The xor reg, reg is also special cased because it's a quick way for compilers to reinitialize a register and indicate that future uses of that register don't depend on any previous operations. It helps the cpu to parallelize any future instructions that use that register since the cpu knows that those instructions don't depend on anything that happens before the the xor.

        • Someone 15 days ago

          It’s not special cased because it's a quick way for compilers to reinitialize a register and indicate that future uses of that register don't depend on any previous operations, it’s special cased to give compilers a quick way to reinitialize a register and indicate that future uses of that register don't depend on any previous operations.

 is almost ten years old and thus likely dated, but still may be educational.

          • nixpulvis 15 days ago

            Exactly! I was almost expecting to find a special instruction for this, but then again, why waste the opcodes!?

    • fear91 15 days ago

      Yes, it’s 1 cycle but it’s longer to decode and occupies more of l1i cache. It’s not all about execution cycles.

pixel_tracing 15 days ago

So anyone have good recommendations on learning assembly? I followed the link to the instructions ABI but can’t really read on my device.

  • trelane 15 days ago

    For learning assembly, usually you learn the syntax for your assembler. The rest of it (the majority of it) is then learning what instructions are available on your platform.

    I liked as an intro to both. I'd already had a class on computer architecture that did assembly before that, though.

    Once you get going with that, you can download and read the Intel or AMD programmers manuals. Of course, this assumes x86_64.

scotty79 14 days ago

Wait, so this whole mystical inverting is just swapping left and right children of all nodes?

raxxorraxor 15 days ago

x32 converter for GNU/Linux:

  sed -i 's/r//g' invert_tree.asm > invet_tee.asm
  • teknas 15 days ago

    It's not that easy, you also have to account for the pointer sizes (4 bytes instead of 8). Also alignment requirements are different on x32.

    • badrabbit 15 days ago

      It's not only that either. OP used rdi because of calling convention, in x32 you would need to arguments on the stack instead and reference them +ebp instead of using registers if you want non-asm code to call your function.

  • junon 15 days ago

    x16, you mean :D

  • account42 15 days ago

    x32 is still amd64. Perhaps you meant x86 but see the other comments for that.

bugfix-66 15 days ago

Being able to read and understand x86-64 assembly (or PTX/SASS for an Nvidia GPU) is much more important than being able to write it. In practice, even when you're writing assembly, you're looking at reference assembly generated by a compiler from C code you wrote.

Similarly, the reality is that as a professional programmer you spend no time doing work like leetcode.

Instead, you spend a lot of time understanding and slightly modifying (fixing or enhancing/extending) code.

With the rise of language model code completion systems (e.g., Microsoft Copilot) even more time will be spent inspecting and understanding code to find problems.

With these facts in mind, I have been building a new form of leetcode:

Most puzzles are interesting algorithms that you will learn useful techniques from, so it's never a waste of time to think about them. And even though the bugs are all quite trivial, I can see it's very challenging for many people.

It's about half-way ready to launch, needing 30 more puzzles. I am working my way through Knuth's The Art of Programming Volume 4B and today I'll see if Algorithm X (Dancing Links Exact Cover Backtracking) can be made to fit for Bugs 38 and 39 (or whether it's too complicated).