danlark 10 months ago

You can see hashing optimizations as well https://www.deepmind.com/blog/alphadev-discovers-faster-sort..., https://github.com/abseil/abseil-cpp/commit/74eee2aff683cc7d...

I was one of the members who reviewed expertly what has been done both in sorting and hashing. Overall it's more about assembly, finding missed compiler optimizations and balancing between correctness and distribution (in hashing in particular).

It was not revolutionary in a sense it hasn't found completely new approaches but converged to something incomprehensible for humans but relatively good for performance which proves the point that optimal programs are very inhuman.

Note that for instructions in sorting, removing them does not always lead to better performance, for example, instructions can run in parallel and the effect can be less profound. Benchmarks can lie and compiler could do something differently when recompiling the sort3 function which was changed.

For hashing it was even funnier, very small strings up to 64 bit already used 3 instructions like add some constant -> multiply 64x64 -> xor upper/lower. For bigger ones the question becomes more complicated, that's why 9-16 was a better spot and it simplified from 2 multiplications to just one and a rotation. Distribution on real workloads was good, it almost passed smhasher and we decided it was good enough to try out in prod. We did not rollback as you can see from abseil :)

But even given all that, it was fascinating to watch how this system was searching and was able to find particular programs can be further simplified. Kudos to everyone involved, it's a great incremental change that can bring more results in the future.

  • CalChris 10 months ago

    This sounds like a more intelligent version of superoptimization. The original Masselin SO, albeit for its time, also created surprising results which is similar to AlphaDev's incomprehensible for humans. You see the same thing in computer chess which Agadmator calls disgusting engine lines.

    https://courses.cs.washington.edu/courses/cse501/15sp/papers...

  • thomasahle 10 months ago

    I'm disappointed a the hashing is just based on training on microbenchmarks and SMHasher, rather than designing a fast _provably_ universal hash. Suites like SMHasher are never complete. They are just trying to catch the most common weaknesses. If you train on the test cases you'll only get an algorithm that passes the tests, but people can always find a set of values on which you will do badly.

    • eklitzke 10 months ago

      I think you're confusing proving that the hash function is collision resistant with the other goal which is hashing speed. If you really need a collision resistant hash you need to use a cryptographic hash function, but outside of cryptographic applications that is rarely the requirement. And (huge caveat, this isn't my domain expertise) I'm not sure what security properties are really "proven" about existing cryptographic hash functions, AFAIK existing cryptographic hash functions are considered secure because we don't know how to break them, not because of some fundamental mathematical property about them.

      For the other 99.999% of hashing applications there is a balance between collision resistance and hashing latency. For example, in a hash table (probably the most common use for a non-cryptographic hash function) there is a cost incurred by hash collisions because lookups on keys with collisions may have to do extra probing. On the other hand, every hash table lookup requires doing at least one hash operation, regardless of whether or not it collides. So it may make sense to have a slightly worse hash function (in the sense that it is more likely to have collisions with pathological inputs) if it has slightly lower latency. The only way to really know what is faster for a real world application is to have some kind of benchmark to train against as a loss function.

      • thomasahle 10 months ago

        > I think you're confusing proving that the hash function is collision resistant with the other goal which is hashing speed. If you really need a collision resistant hash you need to use a cryptographic hash function.

        I wish this misconception would die. There is a great theory of algorithmic probabilistic hash functions, completely distinct from cryptographic hash functions. If you are designing a hash table, or a different algorithm using a hash function, you nearly always want the former kind.

        The idea is that `Pr[h(x) = h(y)]` is small _no matter the inputs x and y_. Here the probability is over the random seed of h. Lots of good hash functions, like UMASH (https://engineering.backtrace.io/2020-08-24-umash-fast-enoug...) has this guarantee. Other fast hash functions, like MURMUR don't.

        When a function doesn't have this guarantee, it means I can find sets of values x1, x2, ... that will likely collide under _any_ or most seeds! Sure, if your inputs are basically random, this probably won't happen, but people can still use this to DDoS your hash table, or whatever you are coding.

        Notice again, this has nothing to do with cryptography. It is all about probabilistic guarantees. You can't just test the hash function on a fixed number of inputs and say it's good, since you may just have moved the "bad set" to somewhere else.

        In this day and age there are super fast algorithmic hash functions with guaranteed low expected collisions. It's just silly to use one that you can break so easily.

        • Dylan16807 10 months ago

          > The idea is that `Pr[h(x) = h(y)]` is small _no matter the inputs x and y_.

          That sounds like such a function is strongly collision resistant, which means it's also second preimage resistant. And that gets you most of the way to a cryptographic hash function.

          Is the only difference that it doesn't have to be first preimage resistant? Compared to cryptographic hashes, does that expand the set of viable functions a lot, to allow first preimages while still not allowing second preimages?

          > It is all about probabilistic guarantees

          So are cryptographic hash functions.

          When I search for `algorithmic probabilistic hash functions` I just get results about bloom filters.

          • thomasahle 10 months ago

            > > It is all about probabilistic guarantees

            > So are cryptographic hash functions.

            Cryptographic hash functions like MD5, SHA-2, BLAKE2, etc are deterministic functions, so it doesn't really make sense to talk about Pr[h(x)=h(y)]. Either the collide or not.

            It's muddied a bit by the fact that cryptographers also use universal hashing (or probabilistic hashing, or what I called algorithmic hashing) for stuff like UMACs, https://en.m.wikipedia.org/wiki/UMAC#NH_and_the_RFC_UMAC , but they often have a lot of extra considerations on top of just collision resistance.

            Some algorithms also need stronger probabilistic guarantees than just collision resistance (see e.g. https://en.m.wikipedia.org/wiki/K-independent_hashing#Indepe... ). These properties are usually too hard to test for with an experimental testing suite like SMhasher, but if your hash function don't have them, people will be able to find inputs that break your algorthm.

            • Dylan16807 10 months ago

              > Cryptographic hash functions like MD5, SHA-2, BLAKE2, etc are deterministic functions, so it doesn't really make sense to talk about Pr[h(x)=h(y)]. Either the collide or not.

              Eh, that's how I usually see collision resistance described. The probability is based on generating fresh inputs with any method you want/the most effective attack method available.

              But I wouldn't say the hash you linked is nondeterministic just because it has a seed. You can seed MD5, SHA-2, and BLAKE2 by tossing bytes in as a prefix. It'll prevent the same attacks and you can give it the same analysis.

              So I'm still not sure in what sense a hash like this is facing different requirements than a cryptographic hash.

              • thomasahle 10 months ago

                > You can seed MD5, SHA-2, and BLAKE2 by tossing bytes in as a prefix. It'll prevent the same attacks and you can give it the same analysis.

                I'm curious if you can link to such an analysis. These functions are notoriously much harder to analyze than simple functions like "h(x) = ax+b mod p" which is all you need for the probabilistic guarantee.

                But even if you could analyze this, you would just end up with a universal hash function that's way slower than you need, because you didn't pick the right tool for the job.

                • Dylan16807 10 months ago

                  By definition, if they're secure then they should meet the requirements, right?

                  > But even if you could analyze this, you would just end up with a universal hash function that's way slower than you need, because you didn't pick the right tool for the job.

                  I understand that, I'm just trying to figure out how a universal hash is easier to construct. But as you've gone through the descriptions here I think I understand how the collision resistance necessary is much much simpler, and there seems to be an assumption that the output of the hash will not be available to the attacker.

              • versteegen 10 months ago

                > But I wouldn't say the hash you linked is nondeterministic just because it has a seed. You can seed MD5, SHA-2, and BLAKE2 by tossing bytes in as a prefix.

                Yes, but the point is that hash functions used for hash tables are much, much faster than these cryptographic ones.

      • tga_d 10 months ago

        >If you really need a collision resistant hash you need to use a cryptographic hash function, but outside of cryptographic applications that is rarely the requirement.

        There are reasons to use (strongly) collision resistant hashes outside of cryptographic settings. E.g., the default Rust hash function, used in hash maps and sets, has strong collision resistance, because otherwise you could open up applications to DoS attacks (the attacker uses lots of inserts with collisions to kill performance of accesses and further inserts at those buckets).[0]

        >I'm not sure what security properties are really "proven" about existing cryptographic hash functions, AFAIK existing cryptographic hash functions are considered secure because we don't know how to break them, not because of some fundamental mathematical property about them.

        There are provably secure hash functions[1] (typically using the same sort of primitives as public key crypto), but they're generally only used when certain properties need to be composed, and are often less secure than the non-provable ones in practice anyway. This is pretty similar to the state of symmetric vs. asymmetric cryptography in general: primitives like RSA, DH, etc. have much stronger proofs than AES, but algorithms built using AES for security are generally viewed as a lot less likely to be broken any time soon than algorithms built using typical asymmetric primitives for security, even ignoring things like quantum advantage.

        [0] https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

        [1] https://en.wikipedia.org/wiki/Security_of_cryptographic_hash...

      • Someone 10 months ago

        > I'm not sure what security properties are really "proven" about existing cryptographic hash functions

        AFAIK, we don’t even know whether trapdoor functions exist.

        https://en.wikipedia.org/wiki/Trapdoor_function:

        “As of 2004, the best known trapdoor function (family) candidates are the RSA and Rabin families of functions”

        Also note that the ‘examples’ section starts with:

        “In the following two examples, we always assume it is difficult to factorize a large composite number (see Integer factorization).”

        • aidenn0 10 months ago

          IIRC, if P=NP then trapdoor functions do not exist, so proving that one existed would be a huge deal even outside of cryptography.

          • Dylan16807 10 months ago

            Only if your definitions of "easy" and "hard" are based entirely on complexity classes.

            If you show me a setup where "easy" is n^3 and "hard" is n^15 I will happily call that a trapdoor function.

    • jacquesm 10 months ago

      Indeed, and this has been the case for quite a while now. You can always improve on some general algorithm by taking advantage of knowledge of the data but that never generalizes and usually leads to either worse performance on other data and/or new pathological cases that result in results that are unusable.

      It's an instance of overfitting.

      • matteoraso 10 months ago

        >Indeed, and this has been the case for quite a while now. You can always improve on some general algorithm by taking advantage of knowledge of the data but that never generalizes and usually leads to either worse performance on other data and/or new pathological cases that result in results that are unusable.

        Deepmind did the exact same thing with AlphaTensor. While they do some geniunely incredible things, there's always a massive caveat that the media ignores. Still, I think it's great that they figured out a way to search a massive space where most of the solutions are wrong, and with only 16 TPUs running for 2 days max. Hopefully this can be repurposed into a more useful program, like one that finds proofs for theorems.

      • bee_rider 10 months ago

        Ship the optimization framework in with the application, sample from the user data, and optimize for that? It isn’t overfitting if you overfit on the data you care about, right?

        • jacquesm 10 months ago

          Data tends to change over time, and once a hash function is in use you can't really replace it easily without a lot of overhead, possibly quite a bit more overhead than what you saved in the first place. There are some examples of this in the sorting arena too, such as 'Timsort', personally I haven't found any that gave a substantial boost, but probably there are some cases where they do. Unless sorting or hashing (and lookup) are the main bottleneck for an application I would spend my time on other aspects of it.

        • scns 10 months ago

          Sounds like the JVMs recompilation of hor paths to me.

  • hajile 10 months ago

    To what extent is this simply working around the weirdness of x86? Do these improvements apply to something like MIPS, ARM64, or RISC-V that have inherently simpler ISAs?

    • danlark 10 months ago

      In this particular case they were universal but in paper it's said the optimizations were done on x86. One of the ideas was to use LLVM IR but intuition for optimizer over optimizer was unlikely to work properly.

      • ndesaulniers 10 months ago

        > but intuition for optimizer over optimizer was unlikely to work properly.

        Wut?

        • mindwok 10 months ago

          My guess: Using LLVM IR would mean that the LLVM optimiser might have made the results more noisy or hard to understand when it was compiled to actually execute.

  • moonchild 10 months ago

    > 9-16 was a better spot and it simplified from 2 multiplications to just one and a rotation

    I'm very confused as to why rotation was at all useful. Xoring with a random-ish constant makes sense, because the constant has high entropy and is likely to decorrelate bits from the input (also can use a different constant per hash table). But rotating by a constant—and a fixed one at that—seems like it just accounts for expected input distribution. Especially (assuming this is intended for text) if shifting by a value >8 makes a significant difference (vs shifting by the same value mod 8), it smells like serious overfit. Could be useful for something like a perfect hash, but seems problematic and prone to issues as a general hash.

    Edit: to make my objection clearer: the hash simply replaces lo with rotr(lo, 53). If rotr(lo, 53) performs significantly better than rotr(lo, 53 mod 8), then that implies the following. I can take a set of strings, and I can apply the same permutation to the characters of all of the strings in the set, and this will significantly affect the quality of the hash function. That seems like an undesirable property, even for a non-cryptographic hash.

    • manwe150 10 months ago

      Does that end up moving high bits into the low bits? That could possibly be very helpful for all sets of strings, since the equality test will start with the first byte, so it could better to have the rotr on the hash so that the hash is less affected by the first byte (and more affected by later bytes). Just hypothetically speaking that is where the implication could break down, since it doesn’t consider the non-uniform cost of the equality.

      • moonchild 10 months ago

        > the equality test will start with the first byte

        I would expect the equality test to compare at least a full word at a time, just as the hash hashes at least a full word at a time.

        • manwe150 10 months ago

          I didn’t look at their implementation, but in general, strings don’t have to be aligned so you can only peek one byte at a time looking for the end, besides not wanting to annoy valgrind and other similar UB detection tools by reading past the end of the string.

          • moonchild 10 months ago

            Strawman; nul-terminated strings are horribly slow for nearly every application. Hence I assume (especially given that they are using c++) that they are using length-accompanied strings rather than nul-terminated ones.

  • 13of40 10 months ago

    > something incomprehensible for humans

    This might be buggy whip talk, but I wonder if you could take the same system and apply it to smaller problems (e.g. computing an 8-bit hash) so the novel techniques could be identified and used by humans.

    • tyingq 10 months ago

      One of the examples from another comment[1] here was:

      "They found that in a sorting network handling 3 inputs, the AI found a way to save an instruction by reducing a "min(A, B, C)" operation to just "min(A, B)" by taking advantage of the fact that previous operations guaranteed that B <= C."

      Which isn't incomprehensible at all.

      [1] https://news.ycombinator.com/item?id=36229068

  • cabalamat 10 months ago

    > proves the point that optimal programs are very inhuman

    Maybe there should be an AI that produces optimally-readable/understandable programs? That's what I would want if I was adding the output to a codebase.

    • whatever1 10 months ago

      Optimal routing of delivery vehicles for UPS/Fedex etc is also non-compressible to the drivers, so the planners often intentionally generate suboptimal solutions.

      A suboptimal implemented solution is a better than an optimal not implemented one.

      • cabalamat 10 months ago

        > Optimal routing of delivery vehicles for UPS/Fedex etc is also non-compressible to the drivers, so the planners often intentionally generate suboptimal solutions.

        Really? That's the first time I've heard that (and I've worked on vehicle routing).

        Normally, the driver would get the next address they have to visit and would use satnav to work out how to get there. They don't need to "comprehend" the overall route.

        • whatever1 10 months ago

          Packages might have delivery time windows attached to them, so the optimal solution calls for multiple visits in the same neighborhood by the same van. This is bs from a driver’s perspective.

    • m3kw9 10 months ago

      It’s not that is written like obfuscated, the routine/ algo is just hard to understand even if they commented every line. Likely some recursive trick is involved, those are always hard to follow

    • make3 10 months ago

      that's what LLMs can with rl from human (or ai) readability feedback & instruction tuning + prompting. we will 100% see this if gpt-4 doesn't already do this.

      • ska 10 months ago

        I wouldn't classify any of the output I've seen so far as "optimally readable/understandable".

        Some if it looks pretty ok, especially where it overlaps with well established approaches.

        • chaxor 10 months ago

          It can do well with optimization and readability *if you ask it specifically for those things*. Especially if you have a particular paradigm and algorithm in mind (you obviously already should anyway).

          This is why these systems are helpful in programming: they allow developers to think more about the design paradigms, and algorithmic solutions, rather than the fine grained code syntax and typing.

          My hope (not prediction unfortunately, but *hope*) is that these systems will make people "better* programmers. This could happen by alleviating the requirement of typing out the code in a particular way, and allowing more time to really try out or think carefully about the correct solution for how to make their programs (i.e. multiprocessed, producer-consumers, distribute data with ipfs, faster algorithms, etc)

          • ska 10 months ago

            > It can do well with optimization and readability if you ask it specifically for those things

            My experience so far (including -4) this isn't really true, even when focusing on those aspects. I'm cautiously optimistic this will get better.

  • brundolf 10 months ago

    Thought: optimizing JIT compilers like V8 already observe code behavior and use that to choose how to optimize things as they go. I wonder if one day V8 will incorporate some ML to help study and optimize the running code

  • gumby 10 months ago

    Your description makes the approach sound like applying ca 1980s simulated annealing, also a form of gradient descent. Am I missing something?

    • ska 10 months ago

      This doesn't really sound like SA, which is a principled approach to avoid the local min problem in descent algorithms and end up with a true global optimization. The general result is clean and impractical, the approximations hard to analyze.

  • thesz 10 months ago

    So it works as a superoptimizer of sorts.

  • ks84 10 months ago

    I haven’t read the paper, but it sounds like this is a sort of similar approach to genetic algorithms? Create lots of agents with random changes and pick the ones that produce the most promising results? Is my impression wrong?

  • sytelus 10 months ago

    If more compute is spent than before but in parallel to reduce latency then wouldn't this increase power? Shouldn't latency and power both be objective?

  • lukeplato 10 months ago

    > optimal programs are very inhuman

    sounds revolutionary

fwlr 10 months ago

Some very cool improvements found in already highly optimized algorithms.

They found that in a sorting network handling 3 inputs, the AI found a way to save an instruction by reducing a "min(A, B, C)" operation to just "min(A, B)" by taking advantage of the fact that previous operations guaranteed that B <= C. They also found that in a sorting network handling 4 inputs, when it is part of a "sort 8" network, there are similar guarantees (D >= min(A, C) in this case) that can be taken advantage of to remove an instruction as well. The compiler may not be able to compete with hand-written assembly, but it seems an AI can hand-write assembly code that's even better in some cases.

Another improvement is also very cool. They discuss VarSort4, which takes a list that may be 2, 3, or 4 elements long, and sorts it. The existing algorithm is

    if (len = 4) { 
        sort4
    }
    elif (len = 3) {
        sort3
    }
    elif (len = 2) {
        sort2
    }

The AI found an algorithm that looks totally different:

    if (len = 2) {
        sort2
    }
    else {
        sort3
        if (len = 4) {
            sortspecial
        }
    }
It's pretty wild! It immediately runs Sort3 on something that may be either 3 elements or 4 elements long, and only afterwards does it check to see how long it really is. If it's 3 elements long, we're done; otherwise, run Sort4 - but because (having already run Sort3) you know the first three elements are sorted, you can use a special and much simpler algorithm to simply put the 4th element in the right spot.

Very cool. Improving on core LLVM sorting algorithms that have already been heavily hand-optimized by the best in the world is definitely a "it's 1997 and Deep Blue defeats the World Champion in chess" kind of feeling.

  • ComputerGuru 10 months ago

    Interesting that it took AI to pull this off. I think mutagen testing would have discovered the first (by virtue of checking which code isn’t necessary globally) though likely not the second (which needs a new branch, unless that branch was already there?).

    • fkarg 10 months ago

      As I understand it it kind of _did that_, just in an extremely guided kind of way, which is why it produced results in some reasonable amount of time (probably)

  • thomasahle 10 months ago

    I'm surprised they only report results up to sort5. It seems at that level, you could just iterate through all possible programs. AI generated code seems more interested once you get to 10+ values, where classical methods break down.

    • cypherpunks01 10 months ago

      I believe that's because above 5 elements, fixed sorting networks are no longer used. Introsort takes over and dispatches to insertion sort, quicksort, or heap sort as appropriate.

      Divide and conquer strategies are used for larger sorts, and the smaller arrays could include the fixed lengths 3, 4, 5.

    • bitshiftfaced 10 months ago

      The paper notes that they brute forced all solutions for sort3 to verify that it was optimal. It said that this wasn't possible for 4+.

  • dools 10 months ago

    “ The compiler may not be able to compete with hand-written assembly, but it seems an AI can hand-write assembly code that's even better in some cases.”

    This made me think “imagine if AI was the compiler”, that is to say you went from C or whatever to assembly via AI directly so it was “hand writing” the equivalent assembly for your instructions instead of using generic compilation.

    We might find everything runs much faster.

    • riwsky 10 months ago

      Everything except the compiler, that is

      • dools 10 months ago

        Well it's not hard to imagine a "quick compile" option that uses a traditional compiler and an optimised compilation option that you use when shipping a production build or something while AI catches up in terms of speed.

orlp 10 months ago

> AlphaDev uncovered new sorting algorithms that led to improvements in the LLVM libc++ sorting library that were up to 70% faster for shorter sequences and about 1.7% faster for sequences exceeding 250,000 elements.

As someone that knows a thing or two about sorting... bullshit. No new algorithms were uncovered, and the work here did not lead to the claimed improvements.

They found a sequence of assembly that saves... one MOV. That's it. And it's not even novel, it's simply an unrolled insertion sort on three elements. That their patch for libc++ is 70% faster for small inputs is only due to the library not having an efficient implementation with a *branchless* sorting network beforehand. Those are not novel either, they already exist, made by humans.

> By open sourcing our new sorting algorithms in the main C++ library, millions of developers and companies around the world now use it on AI applications across industries from cloud computing and online shopping to supply chain management. This is the first change to this part of the sorting library in over a decade and the first time an algorithm designed through reinforcement learning has been added to this library. We see this as an important stepping stone for using AI to optimise the world’s code, one algorithm at a time.

I'm happy for the researchers that the reinforcement learning approach worked, and that it gave good code. But the paper and surrounding press release is self-aggrandizing in both its results and impact. That this is the first change to 'this part' of the sorting routine in a decade is also just completely cherry-picked. For example, I would say that my 2014 report and (ignored patch of) the fact that the libc++ sorting routine was QUADRATIC (https://bugs.llvm.org/show_bug.cgi?id=20837) finally being fixed late 2021 https://reviews.llvm.org/D113413 is quite the notable change. If anything it shows that there wasn't a particularly active development schedule on the libc++ sorting routine the past decade.

  • blazespin 10 months ago

    Yeah, this type of grifting is very proforma for Researchers. I used to stress about it, but realize it just sort of goes with the territory.

    It's also worth noting that the paper is blindingly obvious and everyone started doing this a long time ago but didn't want to tip their cards.

    And that's the real contribution here - Google is tipping their cards. We now have a rough baseline to compare our results against.

    • aldousd666 10 months ago

      Well. I feel grifted too because being a developer, yet not into the state of the art in algorithms (at that level,) i was inclined to buy their BS

    • ryao 10 months ago

      It is likely meant to help secure further research funds.

  • saiojd 10 months ago

    I feel like your take is overly cynical. The fact that humans can do the same thing by hand is not really the point. The contribution lies in the fact that their method derived this improvement *automatically*, which is where the impact lies. No one cares all that much if a human can make a sorting routine 2% faster, but if a program can do it, it suddenly becomes interesting (since it suggests that a similar approach can be applied to many other routines).

    • orlp 10 months ago

      I am not cynical about the research itself, I am critical of claims such as "new sorting algorithm uncovered", "up to 70% faster", or "first change in a decade". The research is good. The achieved results are massively inflated.

      What they achieved: automatically generated good code.

      What they claim: automatically generated code that is revolutionary and an improvement on the state of the art.

      And as another commenter noted, superoptimizers are also already a thing: https://en.wikipedia.org/wiki/Superoptimization

      There's also automatic searching being done on faster sorting networks that actually recently produced better than state of the art sorting networks: https://github.com/bertdobbelaere/SorterHunter

      • saiojd 10 months ago

        While I agree that the claims are hyperbolic, I think you are approaching this paper from the point of view of someone who knows a lot about sorting. Because of this, its normal that the claims of these guys who probably don't know much about it are grating for you.

        But, at its core, this is really a RL paper. The objective is to see how far a generic approach can work while understanding as little as possible about the actual domain. After AlphaGo exceeded expectations, the question becomes: "What else can RL do, and can it do anything actually useful?", and this paper seems to suggest that it can optimize code pretty well! I'm really not sure they are self-aggrandizing in terms of impact. The impact of an approach like this could potentially be very large (although I'm not saying that it actually is, I don't know enough).

        • bloppe 10 months ago

          Sorting is not a niche topic. Anybody who majored in CS (which is a ton of people these days) will read the abstract and most of the paper thinking "Wow, I can't believe they discovered something better than O(N log N)" because that's usually what people mean when they say "better sorting algorithm". What they discovered here is effectively a new compiler optimization. They should present it as such instead of calling it a new sorting algorithm.

          But ya, discovering a new compiler optimization automatically is kinda cool.

          • saiojd 10 months ago

            I see your point. I just went over the abstract again and I totally agree.

          • elcomet 10 months ago

            I hope people majoring in CS will not think that, as they learned that n log n is the theorically best complexity for a sort algorithm. They will rather think that they found an algorithm with better constants in front of n log n.

          • recursive 10 months ago

            70% better than O(N log N) is still O(N log N).

            • ummonk 10 months ago

              It's 70% better than O(1). The algorithms it found are for sort3, sort4, and sort5, which were poorly optimized in LLVM's libc++.

              • pclmulqdq 10 months ago

                It may have also laundered those from open-source code that wanted to optimize sort3, sort4, and sort5.

          • qumeric 10 months ago

            Tbh people who majored in CS are supposed to know that it was proven long ago that better than O(N log N) sorting (in general case) is impossible.

          • Dylan16807 10 months ago

            > because that's usually what people mean when they say "better sorting algorithm"

            Is it really? I've heard of a few "better sorting algorithms" and it's never meant that in my experience.

            • cubefox 10 months ago

              When people say "sorting algorithm" they mean something like bubble sort or merge sort.

        • rep_movsd 10 months ago

          Anyone who does any basic algorithmic CS stuff would have been exposed to sorting algorithms, their variations, sorting networks and so on.

          There are already superoptimizers who use genetic algorithms to find the most optimal code sequence for small easily verifiable tasks. That is also a form of reinforcement learning in a way

        • bamboozled 10 months ago

          When one has a big fuck off hammer, everything becomes a nail. Seems to apply to ML too.

      • touisteur 10 months ago

        I somehow agree that I'd be far more impressed by something that would find optimal or even just better sorting (or selection) networks for sizes higher than 17 (last time I looked at SOTA).

        • orlp 10 months ago

          Please check my edit right as you commented :)

          • touisteur 10 months ago

            Oh very, very cool thanks a bunch.

            Edit: compiling the hunter code Right Away and hopefully in some weeks I'll have better networks. Selection networks are even harder to find optimizers for, hopefully one can hack this new thing to get some.

      • runsWphotons 10 months ago

        how automatically generated was the code that wrote the code

    • x86x87 10 months ago

      i don't read it as cynical. It's fair game to call bullshit on bullshit. If an approach exists and is known the "insert your favorite AI here" does not discover anything.

    • smeagull 10 months ago

      A thing that is also not novel. People have done search for optimisations for at least the past decade.

      • saiojd 10 months ago

        Sure, but the whole point is to reduce this kind of search to RL, which is a very general framework. Their paper shows that such a generic approach can solve a very specific problem, and solve it well. But, their paper is about improving RL, not about improving sorting.

        • Zacharias030 10 months ago

          sometimes one has to wonder how RL can be both so generic and still be qualification to publich in nature again and again and again ;)

    • mtlmtlmtlmtl 10 months ago

      Automatic tuning and optimisation of code is not new.

  • wsdookadr 10 months ago

    What's surprising is that anyone would've expected an AI to come up with a brand-new algorithm with better complexity than pre-exsiting human-made solutions. How could it possibly come up with something better when it doesn't even understand how the original authors of qsort/mergesort/etc came up with their own..

    Sure, it's great PR for the company, but.. the results just aren't there.

    • congoe 10 months ago

      How could AlphaZero possibly play better chess than humans when it doesn’t even understand the history of chess theory?

      RL doesn’t stop at human levels

      • Maken 10 months ago

        Because the entire history of chess theory is really a set of heuristics to optimize a tree search.

        • lupire 10 months ago

          So is computer science.

          • wsdookadr 10 months ago

            You make it sound so simple. Why don't we let an AI try to come up with all of CS on its own? I doubt it would/could.

      • wsdookadr 10 months ago

        Even if AlphaZero does play better chess, there's absolutely zero it can do in terms of explaining why it played that way. AlphaZero is zero in terms of explainability. Humans have to explain to themselves and to others what they do, this is key in understanding what's happening, in communicating what's happening, in human decision-making, in deciding between what works and what doesn't and how well or how bad it works.

        Returning back to the original DeepMind press release, it's misinforming the public about the alleged progress, in fact no fundamental progress was made, DeepMind did not come up with an entirely new sorting algorithm, the improvement was marginal at best.

        I maintain my opinion that Alphadev does not understand any of the existing sorting algorithms at all.

        Even if AI comes up with a marginal improvement to something, it's incapable of explaining what it has done. Humans (unless they're a politician or a dictator) always have to explain their decisions, how they got there, they have to argue their decisions and their thought-process.

        • elcomet 10 months ago

          It cannot explain because (1) it is not necessary to become good and (2) it wasn't explicitly trained to explain.

          But it's reasonable to imagine a later model trained to explain things. The issue is that some positions might not be explainable, as they require branching too much and a lot of edge cases, so the explanation is not understandable by the human.

          • wsdookadr 10 months ago

            It's unreasonable to give up on explanations and deem something "not understandable" when we've been doing this thing for 3000+ years called mathematics, where it's exactly explainability that we seek and the removal of doubt. The only other entities that we know of who can't communicate or explain what they're doing are animals.

            • elcomet 10 months ago

              Can you explain your tastes? Why you prefer an apple to an orange for instance? Not really.

              Can you explain how you had the intuition for a certain idea ? No you can explain why it works but not how the intuition came.

              • wsdookadr 10 months ago

                This isn't a question of taste. The topic can't be trivialized to a choice between apples and oranges. I actually reject your entire last message.

                • elcomet 10 months ago

                  My point is that most of our actions are intuitive and cannot be explained. maybe this is similar to system 1 vs system 2.

                  • wsdookadr 10 months ago

                    It's fine if you want to refer to Kahneman's classification [1] of instinctual and thorough thinking. Explainability is a separate topic. Also when the amount of energy and compute used are as high as they are.. the results, the return on investment really isn't that high. Hopefully there are better days ahead.

                    [1] https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow

  • tlringer 10 months ago

    Every DeepMind press release is like this.

    • hugopenmir 10 months ago

      You know people at Google, tell Demis about this.

  • TaupeRanger 10 months ago

    This is DeepMind's modus operandi. Every press release is just utterly hyperbolic nonsense that doesn't withstand the slightest scrutiny. AlphaGo, AlphaFold, AlphaDev... they've done literally nothing to improve the human condition, and may have actually made things worse. Go players have DECREASED their Elo after playing AlphaGo (or just quit the game altogether). I would be embarrassed to be associated with DeepMind.

    • chaxor 10 months ago

      This is unbelievably wrong. Deepmind has probably the best academic group in RL. The difference between Deepmind and OpenAI is that Deepmind favors academic endeavours and novelty much more, while completely ignoring any commercialization or products, while OpenAI is the stark opposite in that they almost entirely focus on products first, and typically their academic endeavours are just slight modifications of other works scaled up.

      Don't get me wrong, Sutskyver (et al) has done incredibly good work previously, but when it comes to the products, they're much more engineering and marketing polish than scientific endeavours.

      Botvinick's work on metaRL for example is an interesting direction that Deepmind has shown that few other companies that are only interested in engineering would venture towards.

      • mtlmtlmtlmtl 10 months ago

        That's the thing with Deepmind though. They almost never actually end up advancing things because A) they don't release weights and B) they don't usually develop their ideas into useful tools themselves, forcing others to redo all their work.

        So yeah, it's essentially a PoC PR stunt factory. Just look at AlphaZero. They make a huge deal about a suspiciously set up match against Stockfish. Supposedly revolutionising computer chess. But the problem is the computer chess community had to redo all of the work, including all the training to build Leela Chess Zero. Due to lack of Google-sized datacentres the training took years to catch up to the weights in AlphaZero. Same thing with AlphaGo, same thing with transformers.

        Now, in AI, usually getting a proof of concept is the easy part. Developing that into an idea that actually works in real world situations is usually the hardest part. I completely reject your idea that somehow the work by OpenAI is less worthy of recognition. I think that's just nonsense.

        And surely, Google created Deepmind to actually make them product ideas, not create new competitors, which is what has happened.

        • qumeric 10 months ago

          > Now, in AI, usually getting a proof of concept is the easy part. Developing that into an idea that actually works in real world situations is usually the hardest part.

          I disagree. Of course there is a lot of engineering involved and it's also very important but it's much easier to rebuild things based on published research than develop novel ideas.

    • rcxdude 10 months ago

      I think this is a bit far in the other direction. Deepmind's stuff is often deeply impressive, they just have a tendency to exaggerate on top of that.

      • jprd 10 months ago

        +1

        There must be a "demonstrate (1) DeepMind #Win per <interval>" requirement somewhere that gets the once-over from the marketing dept. to meet some MBOs.

    • 29athrowaway 10 months ago

      Around the time of the AlphaGo challenge and afterwards...

      1) you could see increased activity in go clubs and online go servers

      2) the analysis of the games published by Deepmind has resulted in interesting "discoveries" (or rediscoveries) and changes to what is considered joseki.

      3) many people started analyzing their kifus using AI, to find fluctuations in estimated win rate across moves.

      So I disagree entirely

    • blazespin 10 months ago

      It's a bit surprising how poorly DeepMind has lived up to their hype. But they're an OK lab, maybe a bit overly vain.

      • TaupeRanger 10 months ago

        It's probably very fun to be at DeepMind, I just don't think I'd want to be a part of the cringey hype machine.

        • blazespin 10 months ago

          I bet it really sucks, tbh. They did all this over promising and now the only way they can deliver is by grifting like this. That sounds really stressful to me.

          • lurknot 10 months ago

            ya bet it sucks making $500-800k/year comp getting access to the best hardware and google datasets

            • blazespin 10 months ago

              I've seen people who get promoted above their band, they are not happy campers.

              • blitzar 10 months ago

                Oh god, promoted as well, I for one am glad they are bearing the burden for me.

              • jimsimmons 10 months ago

                Why not. It’s not like they have deadlines

                • 29athrowaway 10 months ago

                  They still have to publish something in major journals and have presence in major conferences.

                  • jprd 10 months ago

                    a.k.a. - they still care enough to have some semblance of shame. There's the rub.

                    • 29athrowaway 10 months ago

                      At least they've produced tangible value unlike black holes of money like the Human Brain project which has delivered close to nothing in multiple decades despite billions of dollars in investment.

    • vosper 10 months ago

      > Go players have DECREASED their ELO after playing AlphaGo (or just quit the game altogether)

      Can you explain this for someone unfamiliar with the game?

      • dfan 10 months ago

        I am familiar with the game and I cannot explain it. Go is not typically rated with the Elo system, and the quality of top-level human play has increased since 2016.

        • nahstra 10 months ago

          I also play some, and yeah that's incorrect. Also, there was recently a bunch of hype about an adversarial strategy that could beat AIs w/o deep readout (i.e. only using the 'policy network' nonlinear ML stuff and not enough actual calculation). Here's a vid of an amateur doing it.

          https://www.youtube.com/watch?v=H4DvCj4ySKM

          EDIT:

          Also, if you want to get into it, Micheal Redmond's Go TV on youtube has an amazing beginner playlist, watch some of that then maybe blacktoplay.com and if you likey play :)

      • schaefer 10 months ago

        Lee Sedol retired in 2019 following his 2016 defeat by Alpha Go in a 5 round match. At the start of the match most people were confident an AI could never defeat a top human player at Go. By the end of the match, watching (arguable) world champ Sedol suffer lost game after lost game the story had changed dramatically. Sedol fans were championing his single win against the unstoppable AI.

        We (hacker news) discussed Lee Sedol's retirement here: [1]

        To active go players at the time, Alpha Go and Alpha Zero really were as shocking as the debut of Chat GTP was recently.

        [1]: https://news.ycombinator.com/item?id=21649495

        • fastball 10 months ago

          This is correct history, but not the point TaupeRanger was trying to make (I believe).

          I think their assertion is that the release of AlphaGo has actually made human Go players worse at the game, contrasted with chess where most agree that the introduction of Superhuman chess engines has elevated the (human) state of play.

          But I don't think there is actually much evidence for that. I'm sure the introduction of AlphaGo did take the wind out of some players sails, who thought of themselves as superior to our best computers, but for everyone else it seems to have elevated the overall level of play just the same as the chess engines have done.

          • TaupeRanger 10 months ago

            I don't think that AlphaGo has made players worse. My point is that there's no evidence that anything USEFUL or IMPORTANT has come from a system that has gotten so much hype (and cost ungodly amounts of money). If players aren't getting better (there's no evidence they are) or are quitting the game after playing, it's simply a net negative, along with DeepMind's other ventures.

            • _hark 10 months ago

              Sorry, but this is just incorrect. Go players have gotten stronger over time overall [0][1], and AI discovered many new ideas that all top pros have incorporated into their game-play (idk how to give a source for this, it's just very well known in the Go community that the style of play changed drastically in response to AlphaGo, and absolutely everyone trains with AI these days).

              [0]: "The sudden overall increase in agreement in 2016 also reinforces the belief that the introduction of powerful AI opponents has boosted the skills of professional players." https://ai.facebook.com/blog/open-sourcing-new-elf-opengo-bo...

              [1]: https://www.goratings.org/en/

              • TaupeRanger 10 months ago

                It's not incorrect. The fact that Go players have gotten stronger over 300 years of recorded data does not in any way show that AlphaGo has made players better. The fact that players are suddenly memorizing AI moves in 2016 and beyond also does not mean they're getting better. This system does not measure how good the players are. It measures how much they copy AI moves (which is rather convenient, since the article is written by AI researchers). The phrase you quoted is so hilariously worded that I initially thought it might be satire. Indeed it does "reinforce the belief" that AI has boosted the skills of players - apparently the researchers themselves are not immune to this "reinforcement of belief"!

            • krisoft 10 months ago

              > My point is that there's no evidence that anything USEFUL or IMPORTANT has come from a system that has gotten so much hype

              Geez. We are talking about pebbles on a wooden plank. They are not even colourful!

              Go is super cool game, but it is that. Just a game. We are not talking about curing cancer, or solving world hunger, or reversing climate change here. So by the very formulation a Go playing AI can be cool, or interesting, or promising. But could it really be useful/important with all-caps? It sounds like you have too high expectations here.

              • hugopenmir 10 months ago

                IT TAKES TIME to solve HUGE CHALLANGES.

        • jart 10 months ago

          The story with Lee Sedol gets even sadder when you look at his rankings chart: https://www.goratings.org/en/players/5.html He immediately lost his heart for the game even though he officially kept playing another three years.

          • schaefer 10 months ago

            I thought I knew the story. But I'd never seen this graph. Thank you for breaking my heart all over again, jart.

      • scotty79 10 months ago

        Maybe they just lost so their ELO went down? Or do loses against AI not count?

        • rcxdude 10 months ago

          Either way, in ELO it depends on the ELO of your opponent, and so a loss against an AI with an accurate (very high) ELO is not going to lose you much unless you're one of the best players in the world (heck, in chess, Magnus Carlsen's ELO would still go down by literally nothing from losing against stockfish).

    • chpatrick 10 months ago

      Massive improvements in protein folding do nothing to improve the human condition? What?

      • TaupeRanger 10 months ago

        Name one improvement. Just one thing that has ACTUALLY helped real life humans and been a net positive since AlphaFold's inception 5 years ago.

        • crocowhile 10 months ago

          I am a neuroscientist, not affiliated with Deepmind. I can´t speak for the other AlphaThings,but AlphaFold dramatically changed the way the biomedicine field deals with protein structures, shortening the gap between hypothesis and experiments by months if not years. You have no idea of what you're talking about.

          • TaupeRanger 10 months ago

            Name one real human being that has benefited from this "shortened gap", aside from the CVs and H-Index of the researchers themselves?

        • bglazer 10 months ago

          We're talking about biomedical science here. Things move slowly because the domain is exceptionally complex and human lives are in the balance.

          AlphaFold catapulted protein structure prediction forward, and it's hard to overstate how important understanding protein structure is in modern drug development

          As an example of how this will be used to help actual people, here's a paper that uses AlphaFold to identify the parts of cancer-associated proteins that interact with each other.

          https://onlinelibrary.wiley.com/doi/full/10.1002/pro.4479

          The obvious next step is to develop drugs that disrupt these interactions and thereby disrupt cancer. But, it's going to take years, maybe decades before any drug resulting from this research is in actual patients.

          There are dozens of other papers like this.

          • TaupeRanger 10 months ago

            And yet, even after 5 years there's no sign of any real, meaningful drugs, even just Phase I trials. In 5 - 10 years (which will be 10 to 15 years after AlphaFold was released) I am willing to bet real money that there will be zero drugs discovered by AlphaFold that meet the following criteria:

            1) The drug couldn't have been discovered without AlphaFold 2) It has been proven to reduce all cause mortality (the thing real patients actually care about) in a randomized controlled clinical trial BETTER than the prior standard of care (or significantly more cheaply, or with significantly reduced side effects)

            • hugopenmir 10 months ago

              Send them your resume or ideas about how to do it faster and better, it could help !

              • TaupeRanger 10 months ago

                There is no way to do it faster or better with these techniques. It is a waste of money - that's the entire point. My advice would be: stop wasting time, money, and human brainpower. Go off and try entirely new approaches to AI that might actually work!

                • chpatrick 10 months ago

                  This is such a bizarre take. AlphaFold is faster and better - but it still takes years to develop anything in the life sciences.

                  It's like pointing to special relativity in 1905 and saying it'll never be useful for anything.

      • girvo 10 months ago

        I think it's fair to say that it (where "it" is defined as "DeepMind's contribution to the protein folding problem") hasn't yet given us massive improvements to the human condition.

        It might, and in fact I think it probably will. But it hasn't yet.

        • bawolff 10 months ago

          When the biggest criticism of deep mind is it hasn't literally saved the world yet, i think that is pretty telling about how impressive it really is.

          • TaupeRanger 10 months ago

            That isn't the criticism at all. The criticism is that it hasn't done ANYTHING, and has probably been a net negative since human brainpower and energy costs are being spent on (so far) useless technology for 5 years. It's not that it hasn't saved the world, it's that it's worse than useless.

      • bamboozled 10 months ago

        To be fair I have read that while impressive, still has little practical application?

    • malux85 10 months ago

      I thought the same thing - it smacks of desperation at the moment, any tiny win is exaggerated .

      It’s not hard to see why, with the emergence (ha) of OpenAI, Midjourney and all of this generative modelling, what has DeepMind done? I imagine the execs at Google are asking them some very probing questions on their mediocre performance over the last 5 years.

      • chaxor 10 months ago

        Deepmind has done quite an enormous amount actually, but it's been in academia not in the commercial product sphere. Just because something is not on a little web page available to average Joe's does not mean there isn't value in it. For example, Deepmind's work towards estimating quantum properties of materials via density functional theory may not be the best toy for your grandma to play around with, but it certainly does move academia way further ahead of where it once was.

        • malux85 10 months ago

          I run atomictessellator.com and have been working on many different implementations of density functional theory for the last 10 years, as well as working closely with professors at Stanford university and Oxford university of using advanced, non-static geometry data structure for density functional theory for multiple years, this is a subject I know a LOT about, so I’m glad you brought it up.

          Deep minds work on Density functional theory was complete rubbish, and everyone in computational chemistry knows it. They simply modelled static geometry and overfit their data, we wanted this methodology to work, computing DFT is expensive, and we did multiple months of rigorous work and the reality of the situation is that a bunch of machine learning engineers with a glancing amount of chemistry knowledge, made approximations that were way too naive, and announced it as a huge success in their typical fashion.

          What they then count on is people not having enough knowledge of DFT / Quantum property prediction to query their work and make claims like “it certainly does move academia way further ahead” - which is total rubbish. In what way? Why aren’t these models being used in ab initio simulators now? The answer to that is simple: they are not revolutionary, in fact they are not even useful.

          • Nimitz14 10 months ago

            Love that you know enough to call him out haha

        • TaupeRanger 10 months ago

          Yes, people often mention the number scientific citations that mention AlphaFold as "proof" of its value. Unfortunately, padding researcher CVs is not a net positive for humanity, and so "moving academia further ahead" (by what metric?) is not necessarily a desirable or worthwhile goal if your definition of "ahead" is sufficiently warped as such. Perhaps, one day, the first real human being will be helped by medicine that couldn't have been found/created without AlphaFold. Unless a scientific endeavor is actually useful to humanity, who cares if it "moves ahead"?

        • dr_dshiv 10 months ago

          How so? Legitimately interested.

      • gusennan 10 months ago

        They solved an open challenge problem, Protein Structure Prediction, with AlphaFold, which has been nothing short of revolutionary in the structural biology and biochemistry fields. I do scientific research in these fields and the capabilities AlphaFold provides are used now everywhere.

        • TaupeRanger 10 months ago

          Yes, many research papers have been written, and many CVs have added lines which include the word "AlphaFold". But has the human condition been improved one iota from the "discovery"? Has anything real actually happened? Not at all. Only "maybes" and "possibilities" after more than 5 years of work. "Revolutionary" at padding researcher CVs indeed.

          • hugopenmir 10 months ago

            Man, with al respect why your “hate” with the good guys working at DeepMind? Everybody loves and respect Demis Hassabis, he is truly a genius. He really wants the best for the world/humanity and that takes a ton of time, so let’s wait and see.

        • bamboozled 10 months ago

          Curious what this research has actually improved in a practical sense? I’m asking for a friend…

    • gdy 10 months ago

      "may have actually made things worse. Go players..."

      Go players are using AI to get better at Go.

      • ipaddr 10 months ago

        That's like wearing a baseball glove on your dominate hand and taking it off to throw the ball. It's easier but at some point you need to relearn how to play to make it to the next level

      • TaupeRanger 10 months ago

        That claim is often made, and never substantiated.

        • digging 10 months ago

          It's made here in response to a claim that Go players are made worse by practicing with AlphaGo, which is also unsubstantiated.

          • TaupeRanger 10 months ago

            Wrong. The claim was never made that players got worse, only that their ratings dropped, which is empirically true. After playing AlphaGo, for example, Ke Jie dropped in the rankings and was quickly taken out of 1st place overall. The overall point though, is that AlphaGo produced nothing of value for humans, since there's also no evidence that players have improved since AlphaGo's creation. Factoring in the immense cost and human brainpower wasted on creating a superhuman perfect-information-game-playing program, and it's easily a net negative for humanity.

            • gdy 10 months ago

              "there's also no evidence that players have improved since AlphaGo's creation"

              Read this for example. The author is Korean pro.

              "The upside is that we sometimes see a player who was somewhat past his prime suddenly climb back to the top, having trained with AI more intensely. There are a growing number of young and new pros who demonstrate surprising strength. This change gives hope to all pros who dream to become number one, and also makes competitions more interesting to fans as well." [0]

              There are serious downsides too.

              Also [1]

              [0] https://hajinlee.medium.com/impact-of-go-ai-on-the-professio...

              [1] https://www.newscientist.com/article/2364137-humans-have-imp...

            • digging 10 months ago

              > Factoring in the immense cost and human brainpower wasted on creating a superhuman perfect-information-game-playing program, and it's easily a net negative for humanity.

              Hear me out - what if we learned something about creating AI by creating a new AI?

        • fastball 10 months ago

          Assuming a total lack of evidence on either side, I think your assertion is the counter-intuitive one and therefore has the greater burden of proof. Why would Go be any different than Chess?

          A rising tide floats all ships – if the best in the world becomes better, others can look at the best and learn from it. What difference does it make if the best player is an AI or a human? The better moves and strategies are still better moves and strategies.

        • Shorel 10 months ago

          Not AlphaGo, but there are newer neural networks tailored not to crush players, but to teach and explain their playing style, such as Lizzie with Leela Zero.

        • gdy 10 months ago

          Just ask any professional Go player.

          • TaupeRanger 10 months ago

            Anecdotes are not data. Look at the game statistics. There is zero evidence that players are playing at a higher level since the inception of AlphaGo.

  • dundarious 10 months ago

    Yes, I believe what they're doing already exists in the literature as "supercompilation", though good to see its application under any name.

  • choppaface 10 months ago

    A decade ago Google had people with breadth and context who could have adjusted the framing of the result of this work (and maybe even re-targeted it yo something useful). Today however Google is a mix of hyper-narrow expert ICs and leaders who lack domain expertise. Paper in Nature? Sure let’s take it!

  • scotty79 10 months ago

    Shouldn't libc++ be kinda good? Since it's the standard and all? Why isn't/wasn't it?

  • usmansk 10 months ago

    Can anyone refute the claim of orlp that it is unrolled insertion sort

  • usmansk 10 months ago

    The claim "faster sorting algorithm" is wrong. They have to show the time-complexity. They have to prove that their algorithm is faster than linear sorting algorithms. Otherwise, they have to accept that their claim is wrong.

  • as4296 10 months ago

    Lol I was about to say that would be incredibly crazy if they found a new sorting algorithm. My time complexity in USACO bout to go crazy.

    • dataflow 10 months ago

      I get your sentiment but note that discovering a new algorithm doesn't have to imply a better time complexity. Bubble Sort and Insertion Sort have the same time complexity but are different algorithms.

  • neximo64 10 months ago

    What you're saying is a certain perspective which seeks to look at it from first principles.

    To the brutalist, the algorithm change is faster, and thats all that matters. A human didn't previously come up with the optimisation. You might as well say a computer sorting an algorithm is bullshit vs a person because the difference is just a bloated chip does it instead, and thats it.

bgirard 10 months ago

The title implies it found an entirely new algorithmic approach to sorting (like quick sort) which would have been a fantastic discovery. But it feels a lot like micro-optimizing the control flow and codegen.

  • fwlr 10 months ago

    I'm not sure there is a new algorithmic approach to sorting like you're thinking of. From a high level you can divide sorting algorithms into roughly two camps, "those that use divide-and-conquer", and "those that do not". The divide-and-conquer (split into smaller sub-lists, sort the sub-lists, and merge them while preserving sort order) algorithms are better. From this perspective, algorithms that we think of as quite distinct like quicksort and mergesort are not that different - quicksort just does the "divide" step of divide-and-conquer in a smarter way.

    In the end, no matter what divide-and-conquer sorting algorithm you pick, you will be doing lots of "small sorts". And it is those small sorts that DeepMind has optimized here.

    • ozr 10 months ago

      > I'm not sure there is a new algorithmic approach to sorting like you're thinking of. From a high level you can divide sorting algorithms into roughly two camps, "those that use divide-and-conquer", and "those that do not". ...

      I think the sci-fi-but-possibly-real hope is that for sorting (among other things), we may have the perspective that there isn't any new algorithmic approach available, but an AI finds one for us.

      • fwlr 10 months ago

        That would be awesome! Obviously it’s hard to imagine what that would look like (since a necessary part of it is “the AI comes up with something we couldn’t imagine”), but here’s one potential idea, based on these DeepMind discoveries being “exploiting guarantees of previous steps” and “you can use simpler sorts when the first part of the list is sorted”: the AI might be able to find some way to perform a cheap-but-weak “divide” step and a cheap-but-weak “merge” step, such that the guarantees from each step happen to interact in a way that produces fully correct sorting.

    • bgirard 10 months ago

      There's lot of accepted sorting algorithms [1]. I'm sure we can come up with novel new algorithms, even if they're not optimal. Like Wikipedia mentions, they all fall within some small number of higher level categories (eg. Partitioning, Merging, Selection, Insertion). I'm still not convinced that the optimizations presented in the article amount to the discovery of NEW sorting algorithms but merely optimizations of existing ones.

      [1] https://en.wikipedia.org/wiki/Sorting_algorithm

    • pxeger1 10 months ago

      Indeed, it's easy to prove that any sorting algorithm that works by comparing elements (unlike e.g. radix sort) requires Ω(n log n) time in the worst case.

  • GravityLabs 10 months ago

    Still a fantastic discovery, because now there are more powerful automated algorithm improvement discovery pipelines. I wonder what will happen when those improvements are added to the processing that found the improvements. :D

    • bgirard 10 months ago

      Yes, I'm not discounting the results. I think it's a very interesting approach. I just think the language is important here. If you ask a CS class turn in their own quick sort implementation, you have n implementations of 1 algorithm, not n new algorithms.

  • ozr 10 months ago

    I think this is still super interesting. It's something humans are unable to do/there's few humans that can. I very much like the pattern of writing basic logic myself and then using a coding model to optimize it. It's effectively what we do with compilers already, this just makes it better.

  • nurettin 10 months ago

    My first intuition, knowing the limitations of generating first-order logic sequences, was that they must have somehow restricted the sorting sequence to a number of elements.

finitestateuni 10 months ago

The most interesting part of this paper to me is that they let the agent guess how efficient it’s own solutions were and only had the model experimentally verify it’s guesses in 0.002% of cases. This allowed the model to search much faster than another program that didn’t guess and had to run every program.

  • tux3 10 months ago

    This is the same sort of "more guesses faster beats smarter guesses slower" that made afl-fuzz by far the best at exploring large search spaces in program fuzzing

    Fast search often beats accurate search. Sometimes adding clever heuristics or more complex scoring "works" but slows down the search enough that it's an overall loss. Another kind of a bitter lesson, perhaps

    • lqr 10 months ago

      But why isn't the proposed method an instance of smart guessing? It reduces oracle complexity with heuristics. The heuristic is "build a machine learning model of the objective function and use it to fake oracle queries most of the time."

  • blackbear_ 10 months ago

    This is actually quite common to optimize stuff in several disciplines. You essentially fit a surrogate model (keyword if you want to look up more) to whatever you want to optimize, then use the model to guide the procedure, making sure that the model is correct every one in a while.

  • dsign 10 months ago

    I've been wondering about a similar approach for biomolecular simulations, where exact computations are still a hard bottleneck. I wonder if something like this could give us a few orders of magnitude more speed.

  • Zamicol 10 months ago

    That sounds like intuition.

    • Filligree 10 months ago

      Intuition is the only thing we've figured out how to automate. Reason turns out to be higher hanging fruit.

      • the_other 10 months ago

        Like humans’ “slow” and “fast” thinking, then?

        • Filligree 10 months ago

          There's likely a connection. Either way, I like to describe AIs like ChatGPT / diffusion models, etc. as operating 100% on intuition. It gives people a better intuition of their weaknesses...

          For GPT you can kind of prompt it to do chain-of-thought reasoning, but it doesn't work very well; not if you compare it to what humans do.

          Once again it seems like what we thought was hard, is easy; what we thought was easy and computer-like turns out to be hard.

        • fragmede 10 months ago

          If you tell ChatGPT "show your work", you get better answers

VikingCoder 10 months ago

Dumb question -

Where are the machine learning de-compilers?

Given an executable program, give me human-readable code that generates the executable exactly. With machine learning guessed function, type, variable names...?

I get that it's a very hard problem. But... Does it just not work? Or have people not done it yet? Or have I missed it?

  • sl-1 10 months ago

    That is a good question!

    Training datasets should be pretty trivial for this, all open source software that is buildable on the internet could provide training sets (source code & compiled code).

    But I guess it would have to be trained specifically for every architechture.

  • skeaker 10 months ago

    Personally I think this would be very exciting. Currently there are a ton of projects to decompile older games (see the Mario 64 and Ocarina of Time de-compilations and subsequent native PC ports as an example), but those projects are huge and take whole teams years to finish. If you could simply point an AI at a project like that and have usable source code in a reasonable amount of time, it would be huge for those scenes. You would have a sort of defacto-open source moment for just about all software. It feels like the stuff of far off sci-fi.

  • pitaj 10 months ago

    This actually doesn't sound that difficult, given we can produce a practically infinite training dataset using a compiler with existing programs.

    What's more interesting to me would be a model that can automatically port a program between languages, such as converting a C program to a fairly idiomatic Rust program.

  • sundarurfriend 10 months ago

    I don't understand WebAssembly much other than superficially, but if wasm is going to be a new, harder level obfuscation for websites, making them harder to customize locally, then I hope ML/AI can counteract that and preserve the Web as a user-customizable space.

  • VMG 10 months ago

    You can already do that to some extent with ChatGPT. Paste in the assembly and it gives pretty good results

  • xwdv 10 months ago

    This is the killer app for me with AI. A way to get a non-copyrighted code for any program you already have binary code for.

    • ealexhudson 10 months ago

      That sounds obviously transformative and on any non-trivial scale I can't see why copyright would be avoided.

      • reaperman 10 months ago

        I think you mean "non-transformative", although in this context I can see there's a bit of an ambiguity in how people would use the word "transform" to mean one thing w.r.t copyright and the "opposite" for code (really, orthogonal, but boolean evaluation would yield the opposite true/false).

        In copyright, "transformative" refers to modifying or adapting a copyrighted work in a way that creates a new and original expression; resulting in a new work with a different purpose or meaning.

        In terms of code, you'd "just be transforming" the assembly code to a systems language of your choice.

cypherpunks01 10 months ago

This was educational because I learned about Sorting Networks.

I didn't hear of them in CS undergrad algorithms, perhaps because they can be thought of as a special case or optimization, rather than fundamental and generic variable-length sort algos.

It's a simple concept, and forms the basis of all the sort optimizations described here.

https://en.wikipedia.org/wiki/Sorting_network

jonbaer 10 months ago

"AlphaDev uncovered new sorting algorithms that led to improvements in the LLVM libc++ sorting library that were up to 70% faster for shorter sequences and about 1.7% faster for sequences exceeding 250,000 elements."

  • dheera 10 months ago

    > up to 70% faster

    So O(0.3(N log N))? That's still O(N log N)

    • caturopath 10 months ago

      They're not going to find a general sorting algorithm faster than O(n log n), but that doesn't mean all O(n log n) sorts are created equal.

      • ralfn 10 months ago

        If the thing you have to sort is within a known domain you can definitely beat o(n log(n)). Just expect crazy memory usage.

        And that's only not the case in theory. But nobody owns a real Turing machine with infinite tape and truly infinite numbers. It doesn't exist in reality.

        You can always divide time by multiplying space with the same factor.

        • xdavidliu 10 months ago

          by "general sort" your parent comment means "comparison sort", and the claim (which has been proved, see CLRS intro to algorithms) is that you cannot do it in better than O(n log n) comparisons.

        • yunohn 10 months ago

          Parent said: > not going to find a general sorting algorithm

          You said: > you have to sort is within a known domain you can definitely beat

          Not sure why you framed your response this way?

          • ralfn 10 months ago

            Sorry I didn't phrase my point well enough.

            Every sorting on a computer existing in reality is within a limited domain.

            The general sorting problem is an artificial problem for theoretical computers with infinite memory.

            It's a philosophical problem not an engineer one

      • iopq 10 months ago

        That's not true because Thorup's algorithm is O(n log log n)

        • xdavidliu 10 months ago

          any sort that only uses comparisons is provably not better than n log n. Hence whatever Thorup is, it doesn't work for arbitrary input, and must assume something further, something like "all elements under 1000 or something"

          • utopcell 10 months ago

            I think the parent's argument is that, since they only evaluated their algorithms on sorting arrays of 32-bit or 64-bit integers, it is fair game to compare against integer sorting algorithms for which we have better complexities than O(nlgn).

            • reaperman 10 months ago

              This seems like a very valid point for iopq's clarifying point in the context of what still might exist to be discovered in the set of theoretically possible new algorithms, though doesn't change that dheera set the bar much, much higher than most people would agree with.

              Thank you.

            • xdavidliu 10 months ago

              but the number of elements in the array is also a 32 or 64 bit integer, so you still cannot do better than O(n log n) even with fancy radix sort or whatever

              • utopcell 10 months ago

                not with radix sort, but you can do better when your elements are integers, even if there there are a lot of them, i.e. even when n ~ 2^32.

                • xdavidliu 10 months ago

                  interesting; got an example or link? What exact asymptotic form do you mean by "better" here?

                  • utopcell 10 months ago

                    There are many such algorithms. For example, [1] is expected linear time, deterministic O(nlglgn) time and linear space.

                    Note that the proposed ML (micro-)algorithms rely heavily on the fact that they are sorting 32-bit or 64-bit integers because they are using conditional moves. A conditional move avoids a branch by converting the instruction to a no-op when the condition doesn't hold. This is fine when moving 4 or 8 bytes because it can be done in a single clock cycle, but you can't apply the same technique when sorting larger objects.

                    [1] https://dl.acm.org/doi/pdf/10.1145/225058.225173

          • graycat 10 months ago

            The Gleason bound is n log(n) and says that no sorting algorithm based on comparing pairs of keys can be faster. Heap sort meets the Gleason bound so is the fastest possible in this context. Actually the usual versions of quick sort are slower. If the keys are not too long, radix sort is O(n) and faster. All this has been well known for decades. I explained a little more in another post in this thread.

            • guyomes 10 months ago

              > If the keys are not too long, radix sort is O(n) and faster.

              More precisely, if the key length is w, then radix sort is O(w n) operations. In particular, if the n elements are distinct integers for example, w is greater than log(n).

            • CyberDildonics 10 months ago

              I think you are getting your information mixed up. Here is a comparison that shows quicksort running in 1/10th the time as heapsort.

              https://johnysswlab.com/why-is-quicksort-faster-than-heapsor...

              • graycat 10 months ago

                Versions of quick sort differ in how the partitions are determined. A guess is that some of the versions are not worst case O(n log n) for sorting n keys. In that case, for sufficiently large n, on a normal computer, any version of heap sort will beat that version of quick sort in number of comparisons, time in seconds, Joules of energy to run the computer, etc.

                It is this point that has much of the academic computer science community saying that no sorting algorithm based on comparing keys two at a time can beat heap sort.

                Sure, we don't know how big n has to be. In practice, in an actual case, the n might have to be too big for current computers.

                Sure, in practice, for some value of n, on some list of n keys, some version of heap sort, and some version of quick sort, the quick sort might run 10 times faster in seconds of elapsed time than heap sort.

                • CyberDildonics 10 months ago

                  I'm not completely sure what you are saying, but do you actually think a heap sort is in general faster than a quicksort or mergesort? You realize that the worst case of a quick sort is easily avoided right? The only way it happens is if you have an already sorted array and you pick your pivots from the minimum or maximum values on every single partition.

                  It is this point that has much of the academic computer science community saying that no sorting algorithm based on comparing keys two at a time can beat heap sort.

                  I think you're the only one saying that. Where did you get this idea? I just showed you a quicksort being 10 times faster than a heapsort. You can try this for yourself.

                  Sure, we don't know how big n has to be. In practice, in an actual case, the n might have to be too big for current computers.

                  This is never going to be true. Quicksort and other sorts exploit locality much better. Partitioning an array gradually makes the sorting more local. That's going to work better relatively to a heap sort as data gets bigger, not worse.

                  Sure, in practice, for some value of n, on some list of n keys, some version of heap sort, and some version of quick sort, the quick sort might run 10 times faster in seconds of elapsed time than heap sort.

                  No, on any modern computer other sorts are going to beat a heapsort. It isn't exotic, it is how it works. Even if you have to scan every element first to get a distribution of your elements to choose your pivot, that is still going to be O(n log n) and it will still beat a heap sort.

                  Heap sorts are not the fastest way to sort. They can be consistent and they can split the time between inserting and extracting elements.

                  • graycat 10 months ago

                    > I'm not completely sure what you are saying, but do you actually think a heap sort is in general faster than a quicksort or merge sort

                    "in general" is not very specific.

                    The main issue is the big-O expression. Again, if win in the big-O expression, then, in the assumptions of the context, for sorting n keys for all sufficiently large n will win.

                    Sure, maybe the heap sort O( n log(n) ) is wrong. In that case, maybe could beat heap sort.

                    How could O( n log(n) ) be wrong? Well for large enough n, could encounter virtual memory paging that would make the cost of a comparison grow with n and an algorithm that had better locality of reference and less paging might win. So, for such a context, would have to redo the big-O derivation.

                    Merge sort is based on comparing keys two at a time, and that was the assumption in Gleason's argument. So, Gleason's argument should also apply to merge sort. Some people prefer heap sort or quick sort because they are in place and merge sort is not.

                    I wrote and you quoted:

                    It is this point that has much of the academic computer science community saying that no sorting algorithm based on comparing keys two at a time can beat heap sort.

                    > I think you're the only one saying that.

                    Also Gleason, Knuth, and much of the academic computer science community.

                    > Where did you get this idea? I just showed you a quicksort being 10 times faster than a heap sort.

                    Your "10 times" is not for all values of number of keys n. The crucial issue is the big-O expression, and your "10 times" says next to nothing about the big-O expression.

                    Again, the big-O expression is crucial because winning in the big-O expression means winning for all sufficiently large n, and that is, sadly, the only really meaningful means we have of comparing "in general" two pieces of code.

                    Instead, sure, if in a particular application have an upper bound on n, say, always less than 10,000,000, then use the code that is fastest from 1 to 10,000,000 or in expectation on the probability distribution of n from 1 to 10,000,000 in the particular application. Or, maybe in some application don't care about the average execution time but definitely always want execution time faster than some given T milliseconds for all n from 1 to 10,000,000. Okay, pick the code that best achieves this.

                    Gleason's work didn't cure cancer, exceed the speed of light, say what is in the center of a black hole, say what happened before the big bang or will happen after the heat death of the universe, .... Instead, Gleason stated and proved a theorem in applied math. The theorem shows what is the best possible given the assumptions. Similarly for what Knuth said about heap sort and Gleason's theorem. A lot of progress in pure/applied math and science is like that -- not everything but just a specific result given some specific assumptions.

                    Or, assume that Joe has a new computer, 1000 cores, 10 GHz clock, 500 GB first level cache, and writes some really careful code in assembler that makes full use of the 1000 processors, to sort 10 billion 32 bit numbers via bubble sort. Then it runs in O(n^2), that is, for some constant k_Joe runs in

                    k_Joe n^2 = k_Joe (10^10)^2

                    milliseconds.

                    Along comes Bill with a PC based on a 64 bit single core processor with a 2 GHz clock. Bill's code uses heap sort. Then Bill's code runs in O( n log(n) ) or for some constant k_Bill runs in

                    k_Bill (n log(n) ) =

                    k_Bill (10^10 log(10^10) )

                    milliseconds.

                    And Bill is a bit sloppy and uses an interpretive language so the constant k_Bill is large.

                    Then the ratio of running times is

                    (k_Joe / k_Bill) (n / log(n) )

                    Well, assume for simplicity and without loss of generality that the log has base 10. Then

                    log(n) = log(10^10) = 10

                    so that

                    n / (log(n)) = 10^10 / 10

                    = 1,000,000,000

                    Well, let's account for Joe's 1000 cores to get a ratio of 1,000,000 and Joe's 5 times faster clock speed to get 200,000. Say the interpretative language Bill used is 10 times slower than compiled C for a ratio of

                    20,000

                    and say that Joe's hand coding in assembler is 2 times faster than C code for a ratio of

                    10,000.

                    So, still Bill's code runs

                    10,000

                    times faster than Joe's.

                    That's an example of, the algorithm that wins in big-O wins for all sufficiently large n, even considering coding in assembler with 1000 cores and a 10 GHz clock.

                    Sure, for n = 100, maybe Joe's code wins -- I'll let you find the ratio.

                    That's what the computer science community noticed some decades ago and, thus, settled on big-O as the way to compare algorithms and code. Again, heap sort meets the Gleason bound in worst case. Since in the context the Gleason bound says that O( n log(n) ) is the best can do, the best quick sort can do is tie. And if the quick sort code does not make the partitions carefully, quick sort won't be O( n log(n) ) in worst case and for sufficiently large n will lose by whatever ratio want, 1,000,000:1 if you want.

                    Look, guys, all this is just from, say, the first week of a ugrad course in computer science.

                    Arguing with the Gleason bound -- about like arguing with the Pythagorean theorem.

                    Don't argue with me. Instead, send a question to a comp/sci prof at your local university giving him the URL of this post.

                    • CyberDildonics 10 months ago

                      Also Gleason, Knuth, and much of the academic computer science community.

                      Prove it, show me who is saying 'nothing can beat a heapsort'.

                      says next to nothing about the big-O expression.

                      Why do you think heap sort is the only n log n sort? Link something that proves what you say.

                      settled on big-O as the way to compare algorithms and code

                      Time determines speed, that's what this thread is about. I already linked you a benchmark of 32 million floats. Link me something that actually backs up what you are saying.

                      Arguing with the Gleason bound -- about like arguing with the Pythagorean theorem.

                      Link me where you got these ideas from. I'll link you something, a google search on 'gleason bound' - https://www.google.com/search?q=%22Gleason+bound%22

                      The only thing that comes up is your comment. I've shown you actual results, you keep saying 'maybe possibly in some scenario I can't show, this other one wins, so it is the fastest'. You are hallucinating a reality that you can't demonstrate.

                      • graycat 10 months ago

                        All your issues have been responded to thoroughly.

                        Here you are embarrassing yourself.

                        • CyberDildonics 10 months ago

                          I guess this is the "I already told you" part of the conversation, but you didn't link a single thing.

                          All you did was repeat your claim over and over. No benchmarks, no link to 'gleason bound' and nothing but hallucinating hypothetical scenarios and declaring that somehow they would back up your claims that go against literally all benchmarks and computer science knowledge. If I'm wrong, show me some links.

              • xdavidliu 10 months ago

                we're talking about asymptotics (i.e. the exponent). Things like "1/10 the time" are utterly meaningless here.

                • CyberDildonics 10 months ago

                  Who is talking about that? They said 'faster' not less algorithmic complexity. 1/10th the time is much faster.

                  • graycat 10 months ago

                    > 1/10th the time is much faster.

                    No, absolutely no, it's not in any meaningful or useful sense "faster" as in which algorithm is "faster". You utterly fail to get this point.

                    • CyberDildonics 10 months ago

                      If something runs in less time, it's faster. If something runs in 1/10th the time as something else it is multiple orders of magnitude faster. I don't think I've ever seen someone try to redefine 'faster' before.

                      You keep trying to equate naive algorithmic complexity and speed, but you haven't even linked anything that shows heapsort is better than other sorts like quicksort at that. You haven't actually linked anything to back up any of what you're saying, even the irrelevant offtopic claims.

            • graycat 10 months ago

              Come on, guys:

              Early in my career, I had a really good career going. I paid a lot of attention to writing fast code.

              Some of that career was in a Navy lab, and some of the people there wrote fast code by going down to the assembly language and checking each instruction, load, store, etc.

              At times that career bumped into some math -- 0-1 integer linear programming, even ordinary linear programming, optimization, e.g., BFGS as elsewhere in this thread, the fast Fourier transform, power spectral estimation, optimal control, stochastic optimal control, classic linear statistics, non-parametric statistics, ill-conditioned matrices, on and on. So, to get a better background in the math, I put my career on hold and went for a Ph.D. in pure/applied math.

              In my first semester the faculty wanted me to take their first ugrad computing course. Heck, I'd already taught such a course at/for Georgetown U. But I took the course anyway.

              Then in the course, the issue of fast code came up. Soon, by some of the computer science faculty interested in computational complexity, I got slapped around like a butterfly in a hurricane.

              Yup, one way, commonly the way first seen, to write fast code is to check each machine instruction, pay attention to caches, locality of reference, etc.

              But another way to write fast code is to back off, basically forget about the individual instructions, etc. and take as the criterion number of comparisons of pairs of keys. Right, just f'get about all those other details of the hardware, just what the compiler did with do-while and if-then-else, etc. That's what was catching on, strongly, in computer science at the time.

              Sooo, broadly that's two quite different ways to look at how to write fast code.

              The Gleason bound? That's in one of the D. Knuth volumes The Art of Computer Programming. That was A. Gleason, a math prof at Harvard with a spectacular career -- before his Ph.D., solved one of D. Hilbert's famous problems intended to keep mathematicians occupied for the 20th century, was made a Harvard Fellow, joined the math faculty, and never bothered with a Ph.D.

              Gleason started with, for any given positive integer n, we will be sorting n keys. Sooooo, how big of a problem is that? Well (from my memory and not looking up my copy of Knuth on a shelf just behind me), assume the keys are distinct, that is, no ties. Then the problem is sorting all n! permutations of the n distinct keys. Then, ... Gleason argued from just counting the permutations and assuming that the sorting was by comparing pairs of keys, that on average could not sort in fewer than O(n log n) such comparisons. So, Gleason just counts comparisons and ignores number of parallel processors, number of levels of cache memories, the details of the instruction set of the processor(s), .... Then, as I recall, Knuth continues on and argues that heap sort achieves the Gleason bound both on average and worst case. Sooooo, in that context, heap sort is the fastest possible.

              Right: The class could have had a contest, who can write code for the fastest sort on a certain list of, say, 10,000 names. Some people use quick sort, heap sort, radix sort, shell sort, bubble sort, ....

              No telling who will win. Even if several students use heap sort, no telling.

              So what CAN we tell? As n grows, even some really inefficient coding, maybe even in an interpretive language, will totally blow away like that butterfly in a hurricane ANY coding of bubble sort. And as I recall, it's possible for quick sort to lose on some permutations unless the partitions are selected carefully -- that is, the worst case performance of some versions of quick sort can fail to achieve the Gleason bound and run slower than even a very inefficient coding of heap sort.

              That is, if want to take Gleason's approach, just count comparisons of pairs of keys and look at the big-O results, can't beat heap sort.

              A short answer is, if win in the big-O comparison, then, no matter how sloppy the coding, for all sufficiently large n, still will win no matter how measure speed. In short, that's the reason people took big-O very seriously.

              Yes, there is more to computational complexity than I've outlined here, and as I've already mentioned, in some contexts there can be more to sorting.

              Still, again, in short, in simple terms, in a very important sense, Gleason was right, can't beat the Gleason bound, and can't beat heap sort.

              I just wanted to improve my career. I never wanted to be a college professor, teacher, researcher, etc., yet for various reasons I've done all those things.

              Now "to improve my career", I want to be a successful entrepreneur. My startup? It might flop, and I can't be sure it won't. But it might be worth $100 billion, and I can't say it won't. Here I've made a little contribution to the teaching of computing, coding, and computer science of sorting. But I'm no college professor. Back to my startup. A Chaired Professor of Applied Math? I don't want to occupy one. If my startup is really successful, maybe I'll fund one.

              • CyberDildonics 10 months ago

                This wall of text is very bizarre. First, I don't know where you got "gleason bound" from, but if you search for it on google, your comment in this thread is the only thing that comes up.

                Second, your "alternative speed" measures are a hallucination.

                Sooo, broadly that's two quite different ways to look at how to write fast code.

                No there isn't. The one that takes 1/10th the time of the other one is faster. You going off on tangents and making up terms to try to say that a heap sort is the fastest sort (of all the strange things to argue) is nonsense.

                • graycat 10 months ago

                  > First, I don't know where you got "gleason bound" from,

                  For the answer and posted in this thread, I wrote:

                  The Gleason bound? That's in one of the D. Knuth volumes The Art of Computer Programming.

                  > Second, your "alternative speed" measures are a hallucination.

                  No. Instead, I wrote in this thread:

                  A short answer is, if win in the big-O comparison, then, no matter how sloppy the coding, for all sufficiently large n, still will win no matter how measure speed. In short, that's the reason people took big-O very seriously.

                  If you want to argue against heap sort, then you need to argue that in counting comparisons the big-O expression for heap sort is wrong and loses out to some other sorting algorithm.

                  The Gleason bound assumes that each comparison costs the same. So you may want to argue that for n keys, as n grows the issues of caches, locality of reference, parallel processors, etc. mean that the cost of each comparison grows so that in the big-O competition heap sort can be beat.

                  I'll let someone else calculate the big-O expressions again considering locality of reference, etc.

                  • CyberDildonics 10 months ago

                    The Gleason bound?

                    Instead of repeating yourself, can you link to some actual information?

                    still will win no matter how measure speed

                    Speed is measured with time. You can keep saying algorithmic complexity is speed, but that will never make it reality.

                    If you want to argue against heap sort, then you need to argue

                    That's not how it works. Other sorts take a fraction of the time. I showed you this already.

                    I'll let someone else calculate the big-O expressions again considering locality of reference, etc.

                    This was never about algorithmic complexity, that's something that you hallucinated. Not only that, but you do realize that other sorts have the same complexity as heap sort right? There a lot of ways to sort with n log n.

                    You are trying to argue something that isn't real to make a point that no one cares about and has nothing to do with this thread.

                    • graycat 10 months ago

                      > You are trying to argue something that isn't real to make a point that no one cares about and has nothing to do with this thread.

                      In a word, you are wrong.

                      I've been very, very, very clear again, over again, once again, yet again, and you just fail to get it, a simple lesson often just in the first week of an easy, first college course in computer science.

                      > Other sorts take a fraction of the time. I showed you this already.

                      Nope. You showed no such thing. Your evidence is meaningless. Heck, even bubble sort could beat heap sort or quick sort under some circumstances.

                      So, again, sit down, pay attention, listen up: What matters for any measurement of performance in comparing code is the big-O expression. Read this again, again, again, write it on the blackboard 1000 times after school, repeat it to yourself before each meal, going to sleep, waking up. You just pass this off as computational complexity irrelevant to execution time. Here you are just wrong, totally, badly wrong. You seem not to understand this. For any measurement, time, Watts, Joules, comparisons, cycles, any measurement, in the reasonable context, what matters is the big-O expression.

                      > There a lot of ways to sort with n log n.

                      Well, merge sort can. Maybe some versions of quick sort can. Okay, there are some ties. I never said there are no ties. But, in the context, can't beat O( n log(n) ) -- the Gleason bound shows this. I've said this over and over and over and over. So, in the context, can't beat heap sort. What you saw in some two pieces of code on 1000 keys is just irrelevant to a meaningful comparison of performance.

                      > The Gleason bound?

                      > Instead of repeating yourself, can you link to some actual information?

                      I gave the information: First in the context heap sort, merge sort, maybe quick sort run in O( n log(n) ) in comparisons and also, in this context, inescapably, in time, cycles, Watts, Joules, whatever. The "faster" is not for n = 1000 but for all sufficiently large n. For n = 1000, anything can happen. Second the Gleason bound says that, in the context, can't sort faster than this. So that's why it's call a "bound", a lower bound on how fast can sort. Third, I gave the reference, D. Knuth's famous book.

                      The Gleason bound is one of the nicer, most powerful, most useful, most important pieces of work in all of computer science, computer programming, sorting, and computing for any and all purposes, in particular for practical performance, and you just fail to get it.

                      You have some problems, some blocks in understanding. You just do not want to learn something new to you. You deeply resent this. Your problem is not about computers or applied math but emotional. For your emotional problems, nothing in computing, computer science, or my writing can help you.

                      • shrimp_emoji 10 months ago

                        > Instead of repeating yourself, can you link to some actual information?

                        > I gave the reference, D. Knuth's famous book.

                        I just Ctrl+F'd "Gleason" in The Art of Computer Programming Vol 1, Vol 2, Vol 3, Vol 4A, and Vol 4B, with no hits in any of the 5 books.

                        I even looked in the glossaries. There's lots of last names -- Glaisher, Glassey, Gnedenko -- and no "Gleason".

                        I'm tempted to side with this iteration of CyberD's brutal takedowns on this one. :D

                        ---- EDIT ----

                        WAIT: I found it in the glossary of Vol 3!

                        "Gleason, Andrew Mattei, 193, 648."

                        For this one, case sensitivity got me when I searched "gleason"!

                        The most relevant bit here seems to be page 193, discussing ways to minimize the average number of comparisons:

                        ```

                        The minimum possible average number of comparisons, obtained by dividing by N, is never less than lg N and never more than lg N + 0.0861. [This result was first obtained by A. Gleason in an internal IBM memorandum (1956).]

                        ```

                        "Gleason" is only mentioned in Vol 3.

                        "Gleason bound" is not used in Vol 3, which must be why it doesn't pop up on Google.

                        CyberD: now on the backfoot

                        graycat's startup: in talks for VC funding

                        • CyberDildonics 10 months ago

                          That's great that you found actual information, but that doesn't seem to back up this person's bizarre claims that 'nothing beats heapsort'.

                      • CyberDildonics 10 months ago

                        In a word, you are wrong.

                        Prove it, show me something.

                        Your evidence is meaningless.

                        I showed you benchmarks with source code. You showed me nothing.

                        Heck, even bubble sort could beat heap sort or quick sort under some circumstances.

                        It isn't going to beat them on 32 million floats, which was what that benchmark showed. And are you now mixing up actual execution time with your other bizarre claims where 'speed' and 'faster' for some reason don't mean less time?

                        Okay, there are some ties. I never said there are no ties.

                        You did actually, now you're back peddling hard. Also these don't tie, they are faster because of locality.

                        Third, I gave the reference, D. Knuth's famous book.

                        Link something then, link any trace of what you are saying.

                        The Gleason bound is one of the nicer, most powerful, most useful, most important pieces of work in all of computer science,

                        Then why is there no evidence that it exists? Link me literally anything you can.

                        You have some problems, some blocks in understanding.

                        No, I have evidence and links that back up what I'm saying. You keep repeating the same things with no evidence. Link me literally anything you can find that reinforces your claims.

                        For your emotional problems, nothing in computing, computer science, or my writing can help you.

                        This is pure projection.

                      • bigbillheck 10 months ago

                        > You have some problems, some blocks in understanding. You just do not want to learn something new to you. You deeply resent this. Your problem is not about computers or applied math but emotional. For your emotional problems, nothing in computing, computer science, or my writing can help you.

                        Every accusation, as they say, is a confession.

    • dekhn 10 months ago

      In the real world, we care about runtime as much as, if not more than, computational complexity.

      • boomanaiden154 10 months ago

        Definitely more. There are lots of things that might be more optimal in terms of raw complexity but end up having abysmal performance due to things like cache locality. Anything with pointer chasing usually kills performance.

        • xdavidliu 10 months ago

          for example, matrix multiplication has all sorts of algorithms with exponent significantly less than 3, but they are nearly all curiosities. In practice, the straightforward N^3 algo is used nearly everywhere

          • dekhn 10 months ago

            I had a long discussion with Rob Pike once that ended with me saying "you know, I don't think people actually use Strassen on supercomputers" (we were discussing how somebody had managed to show an N*2.37 algorithm or something like that and "it was a new world's record").

            • xdavidliu 10 months ago

              nit: ^, not *. ** for Python, but never *

              • dekhn 10 months ago

                Unfortunately Hacker News editor converts double-asterisk to single which is what caused the problem.

                I never use ^ for "to the power of" due to its use in C for bitwise OR.

greenflag 10 months ago

Does anyone have high level guidance on when (deep) RL is worth pursuing for optimization (e.g. optimizing algorithm design) rather than other approaches (e.g genetic)?

  • AndrewKemendo 10 months ago

    Less of a scale problem than a type problem usually in my experience.

    My rule of thumb is when it’s easy to specify a reward function but infinite ways to traverse the action space - versus having a constrained state and action space (small n solution traversal pathways) and only a few possible paths to traverse.

  • jeffbee 10 months ago

    Start with a planet-scale computer that makes the marginal cost of RL be nearly zero, and at the same time spend a lot of money on hashing and sorting so the micro-optimization pays off.

skellington 10 months ago

So clickbait to claim a faster algorithm when all it did was find a faster machine optimization for a particular CPU. It's fun/cool, but not the leap they seem to be representing.

mmaunder 10 months ago

In 2023 we saw the first glimpse of machines becoming better at programming than humans when an AI created a new sorting algorithm that was faster than anything humans had yet created.

usgroup 10 months ago

“ The fixed sort solutions for sort 3, sort 4 and sort 5 discovered by AlphaDev have been integrated into the standard sort function in the LLVM standard C++ library3.”

Not exactly what the title would lead you to believe.

whywhywhydude 10 months ago

It is astounding how something as well as studied as sorting still has opportunities for further improvements!

  • zzbn00 10 months ago

    It is not the sorting per-se which was improved here, but sorting (particularly short sequences) on modern CPUs with really the complexity being on the difficulty of predicting what will work quickly on these modern CPUs.

    Doing an empirical algorithm search to find which algorithms fit well on modern CPUs/memory systems is pretty common, see e.g. FFTW, ATLAS, https://halide-lang.org/

  • dist-epoch 10 months ago

    What's well studies is theoretical algorithmic sorting.

    For practical sorting, for a particular CPU architecture, there is still plenty of low hanging fruit:

    > Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures

    https://opensource.googleblog.com/2022/06/Vectorized%20and%2...

  • jackmott42 10 months ago

    Part of it is because hardware properties are always changing. The instructions available, the relative speed of CPU to memory and the various caches, how big and numerous and fast the various caches are, etc etc.

    • GravityLabs 10 months ago

      I am curious why things can't just get better on a base that doesn't change, until the base changes because the improvements with the new base are just that much better...

      Or is that why hardware properties change so much?

      • zerodensity 10 months ago

        For a time that is what happened. Every year you would get a new CPU and that would be faster at pretty much everything than the version the year before. But then we hit the clockspeed wall and the only way of making faster CPUs was to add complexity to the internals of the CPUs. So branch prediction, micro code pipelining, larger caches, simd instructions more CPU cores ect was the result.

        So nowadays a new CPU might not be better at everything then the previous version but it will most likely have more cache and some internal improvements to pipelining/concurrency.

        Given this, for newer versions it can be useful to add instructions to take advantage of extra pipelining or using a different instruction that happen to be faster now.

hexomancer 10 months ago

> The confidence intervals are represented as latency ± (lower, upper), in which latency corresponds to the fifth percentile of latency measurements across 100 different machines. Lower and upper refer to the bounds of the 95% confidence interval for this percentile.

Does anybody know why they chose fifth percentile? I though we should always choose the fastest time when measuring performance.

  • ajuc 10 months ago

    Usually you discard extreme values to reduce noise, and in fact they wrote that's why they did it:

    > We then take the fifth percentile as our final measurement, because we assume that most noise sources are one-sided (for example, cache misses, pre-emptions and so on). During training we process the measurements across ten machines for computational efficiency.

    > I though we should always choose the fastest time when measuring performance.

    Depends. For games you usually do sth similar to what they did - exclude small percentage of worst results to reduce influence of noise and then optimize the worst scenario to make the game run consistent and smooth.

    • janwas 10 months ago

      One possibility which seems not so well-known is that clocks with per-core state might not be perfectly synchronized. If your initial measurement is from core0, then we migrate to core1, the end measurement could even be 'before' the initial.

      Then there are manufacturing differences between cores that affect e.g. their leakage current and thus the (turbo) frequency at which they can run.

      So the measurement noise is indeed not one-sided, that is to say: measurements are not always overestimates. Thus a trimmed mean on both sides is a good idea, and pinning threads to a core when measuring is also helpful.

  • gcr 10 months ago

    Because they want to make sure that the sorting algorithm works well for all possible workloads, not just the most preferable ones.

    If we measured sorting algorithms by the fastest measurement, we might conclude that BubbleSort is the fastest possible sort algorithm on some inputs. (Bubblesorting an already-sorted list makes at most one comparison per list element)

    • hexomancer 10 months ago

      I don't think that's what they meant (or I have misunderstood). Running the same algorithm on the same input still has variations because of OS/CPU idiosyncrasies. When measuring performance we usually run the algorithm on the same input multiple times and report the fastest performance.

jeffbee 10 months ago

The sorting seems cool but I am more interested in the varint decoder. Does anyone know where the source code of that landed, if it did?

mgaunard 10 months ago

Articles were making bold claims, but it's just optimizing libc++ and abseil's implementations. There are already known implementations faster than either of these in the right scenarios.

As always, speed depends on the use case. What's optimal for a sorting algorithm depends on the size of the data, its distribution, and also how it fits into the rest of the program and how that program fits into what else the machine is doing.

For example, you can make things faster by using more hardware resources, but that would penalize the rest of your program that doesn't have access to them anymore. I've frequently seen cases where a function was made faster in a microbenchmark by using all of the registers available, but then real programs ended up slower because the rest of the code had to spill to call it.

thraidh 10 months ago

Can someone explain why the sort3 function produced by Alphadev is correct?

The instruction that was removed was: `P=min(A,C)`, which means that `P=A` at that point.

The next instructions:

    cmp S Q
    cmovg Q P
with `S=min(A,C)` and `Q=B` can be translated into

    if S>Q: P=Q
or

    if min(A,C)>B: P=B
which means: if B is the smallest, then `P=B`. Otherwise P stays as before, meaning `P=A` for Alphadev and `P=min(A,C)` for the original.

So, the end result for sorting A,B,C=3,2,1 would be 3,2,3 for Alphadev's code.

I can't believe, I'm the first one to notice, so I'm probably wrong, but I cannot see where.

  • thraidh 10 months ago

    Found my mistake. I wrongly assumed the algorithms shown in the paper were the original sorting algorithm and the one improved by Alphadev.

    Apparently it just shows some algorithm that was modified and resulted in a different one, neither being a sorting algorithm, but the original still being better.

    The text praises Alphadev by saying it makes "moves" that look like a mistake, but are actually brilliant. After that passage the code is shown that does not corroborate that statement, and just illustrates that Alphadev can make changes to code.

Upvoter33 10 months ago

The only thing that's really weird here is that this is published in Nature. It should be sent to a systems or database conference. Nature publishing minor CS results is weird.

EGreg 10 months ago

Almost all ML is basically searching through a (latent) space of possible answers and indexing the results. So it outperform most known methods. Like when AlphaGo using MCTS beat Rybka, etc.

I would be interested to know if ML can be used to reverse hashes, such as to more quickly solve the numbers that satisfy the Bitcoin Hash challenge. Maybe it can even get P and NP closer together!

Are there theoretical results on SHA-256 or something that preclude finding an ML algorithm that, with a billion parameters, can speed up the search for a Bitcoin Hash input?

lumb63 10 months ago

This is really cool. I’ll be interested to see if the team can produce useful and provably hard cryptographic hash functions with this tech. The other interesting application that this inspires is use of this tech to optimize the optimization algorithms used by compilers. Perhaps we can all benefit from optimized optimizers.

  • boomanaiden154 10 months ago

    There’s already been quite a bit of work done on replacing compiler heuristics with ML models. Google has productionized it with MLGO and there have been quite a few papers/experiments on the topic.

gfd 10 months ago

Can someone summarize how it achieves "provable correctness"?

dontupvoteme 10 months ago

From "It can't Code" to publishing articles in Nature.

hamilyon2 10 months ago

I want this for sql compilation, plan choosing. So much effort is wasted on suboptimal database performance and trying to improve that. I think even sorting pales in comparison.

Cold_Miserable 10 months ago

They haven't even found an optimal sort for a ymm of 8 or zmm of 16 integers? Just 5 elements? That's completely and utterly useless.

sebstefan 10 months ago

If they went for an integration in LLVM I wonder why they tuned their model on x86 instead of LLVM IR, the intermediate representation the compiler uses

ayakang31415 10 months ago

Shouldn't the paper named faster sorting algorithm "optimized" using deep RL instead since no new sorting algorithm is introduced at all?

Havoc 10 months ago

I like the high impact dynamic of this thinking.

If that can be done say OS wide I could see that having an actual impact on global energy usage

graycat 10 months ago

"Faster sorting"?

Heap sort achieves the Gleason bound and, thus, is the fastest possible sort by comparing pairs of keys.

For keys not too long, radix sort can be faster.

These facts have been very well known for decades.

There might be more to do in sorting with some different records, keys, computer architecture, concerns about caches and locality of reference, but otherwise, thankfully, sorting is a well solved problem.

  • CaptainNegative 10 months ago

    Heap sort's worst- and average-case running time are both provably within a constant factor of optimality (both in running time and number of comparisons), but it's either likely or straight up provably not actually optimal in any of the four categories.

  • CyberDildonics 10 months ago

    Heap sort gets beat by sorting algorithms with better locality. You can't be the fastest while skipping through memory for every operation.

    • janwas 10 months ago

      Agreed. And I am not aware of a heapsort that can take advantage of SIMD, so our not very well-tuned heapsort is about 20 times slower in practice than our VQSort (vectorized quicksort).

reocha 10 months ago

Pretty impressive, I'm looking forward to optimizations in other commonly used algorithms.

sossunday 10 months ago

Wow, that's really cool - I wonder what's next for CPU optimizations

lngnmn2 10 months ago

Not general or universal. Only for pre-trained data and with abysmal worst cases.

  • caturopath 10 months ago

    I'm not positive what you mean here. Are you saying the discovered algorithm isn't actually good? Didn't LLVM accept it?

alexnewman 10 months ago

A cool way to describe this is, faster compiler discovered using deep RL

air7 10 months ago

Can anyone explain how this worked? as per the paper (which I TLDRed): "A single incorrect instruction in the AssemblyGame can potentially invalidate the entire algorithm, making exploration in this space of games incredibly challenging."

What did it do if it didn't have a useful partial score function? How did it avoid brute force?

  • blackbear_ 10 months ago

    This paragraph:

    > To better estimate latency, we implemented a dual value function setup, whereby AlphaDev has two value function heads: one predicting algorithm correctness and the second predicting algorithm latency. The latency head is used to directly predict the latency of a given program by using the program’s actual computed latency as a Monte Carlo target for AlphaDev during training. This dual-head approach achieved substantially better results than the vanilla, single head value function setup when optimizing for real latency.

    Briefly, they use a neural network to predict whether a given sequence of instructions is correct, and how fast it is. Then they used this neural network to guide the program generation via Monte Carlo tree search [1]. It is this procedure that keeps track of the partial score functions at each node.

    [1] https://en.wikipedia.org/wiki/Monte_Carlo_tree_search

keiuseki 10 months ago

Try make BERT better. So far it is much worse than ChatGPT.

GreedClarifies 10 months ago

Deepmind shouldn’t be part of a for profit entity. There I said it.

They are clearly focused on moving technology forward and helping humanity. That’s great. However, pulling down 1M+ salaries (for L6+ developers) and using hundreds of millions or billions in borg resources while adding nothing to the bottom line is not in the interest of Google. Not to mention the negative effects on productivity as other Googlers attempt to replicate the “publish everything to support my brand” strategy of Deepmind.

I know Google is not a normal company in that Larry and Sergey have complete control of the company, but sooner or later they have to realize that Deepmind needs to be spun off, and further that Demis is entirely the wrong person to run Google’s AI. He doesn’t care a whit about Google, it is only a vessel to fund his research.

Products. A company is about products and customers not research papers. Academia or non-profits are about papers.

  • benlivengood 10 months ago

    I think Google's biggest net positives in the world at this point are Waymo and DeepMind. Way better than spending money on new chat apps.

    Google's history has been having a very lucrative bottom line that allows for very beneficial research with absolutely no hardships on anyone. Supremely better than lining shareholder pockets; shareholders can't think enough quarters ahead for long-term human benefit. Salaries are just the way to attract and retain quality folks.

    • GreedClarifies 10 months ago

      The billions who use their search everyday would likely disagree. The billions who use YouTube per day would likely disagree. The billions who use android would likely disagree. The hundreds of millions who use gmail, or gsuite would disagree.

      Products.

      • casey2 10 months ago

        Products are at best optimization of a yesterday's technology, in tech you realistically can't peek into the future without spending billions on r&d. Having smart people work on any of the things you listed is a (further) waste of their time.

  • sidibe 10 months ago

    Both of these contribute a lot to Google's infra by giving it new challenges. It's not a one-way street, Google wouldn't know what to build towards in their data center/hardware pipeline without these really ambitious projects with huge requirements.