a_wild_dandan 12 days ago

Ah, I remember reading this paper! Essentially, they created synthetic data by solving search problems using A*. Trained on this data, transformers unsurprisingly learned to solve these problems. They then improved their synthetic data by repeatedly solving a given problem many times with A*, and keeping only the shortest step solution. Transformers learned to be competitive with this improved search heuristic too!

Pretty remarkable stuff. Given the obsession over chat bots, folk often miss how revolutionary transformer sequence modeling is to...well, any sequence application that isn't a chat bot. Looking solely at speeding up scientific simulations by ~10x, it's a watershed moment for humanity. When you include the vast space of other applications, we're in for one wild ride, y'all.

  • electricships 12 days ago

    After AlphaZero, people tried to train various combinatorial solvers

    IIRC, they were unable to exceed Sota of SAT solvers (which admittedly are the results of decades of research)

    maybe we just needed bigger networks, or transformers, or just more training compute

    • lou1306 11 days ago

      SAT appears to be rather impervious to statistical methods because it is very non-linear and non-local (besides, of course, being NP-complete). Flip one single literal anywhere and poof!, your formula mght go from SAT to UNSAT, or vice versa, or you might enable a ton of simplifications, etc. So it might be very hard to train a network on such a riotous state space.

      Additionally, in practice SAT solvers are also supposed to provide witnesses (i.e., a satisfying assignment or an unsatisfiability core) to back their verdicts, and that may be an ever taller order for ML.

      • math_dandy 11 days ago

        Since all of these combinatorial optimization problems are, in full generality, are as difficult as possible in the complexity theoretic sense, there is little prospect of AI/ML helping to solve these problems in general. (?)

        But people hope that some subclass of "real-life" instances of one of these optimization problems has enough structure that AI/ML may be able to learn useful heuristics for it. (This is not my field. This is just my take-away from reading some AI/ML-for-CO papers.)

    • davidguetta 11 days ago

      Mm theres one purely based on position analysis and not tree search that has GM level in chess. From deepmind

    • reindeergmz 11 days ago

      Seems kind of clear to me we’re driving towards as little statefulness as possible, compressed symbolic logic, that’s able to compute as many states as possible. Less resource use on RAM and storage as available chip cycles continues to grow

      Human biology is a min/max engine, evolved for years to “enough” food, shelter… and we see daily there is not static state to reality. Our social behavior to try and preserve state is some cognitive dissonant “War is Peace.” insanity

      Math is a compressed form of communication. “Add two apples to the set of all apples” is literally more glyphs than the same statement in usual math syntax. Paper and written literacy was hard to come by back in the day, but verbal literacy was cheap and ubiquitous.

      The economy is built on physical statistical measures. Such a framework to keep TP and food on shelves is the minimal state needed to keep people from rioting. Economy has nothing to do with historical story and political memes at this point.

      We could look further into computing in other substrates, like mycelia. Biochemically shape structure to represent certain states.

      Computing is a concept not coupled to contemporary ignorant business machines. We made a social monolith around computing by propping up big tech

  • unraveller 12 days ago

    what are those other applications? so now everyone has SoTA warehouse scheduling in their pockets no one can be expected to do anything or go anywhere inefficiently.

  • jadbox 12 days ago

    Remarkable indeed, I've been toying with many smaller, unorthodox use-cases for LLMs and continue to be surprised.

    • smcin 12 days ago

      I want to see an optimal (2P) speedrun with Sonic the Hedgehog and Tails. As a fun proof. There are two players working together; also it has more degrees-of-freedom than a Sokoban.

    • jasonjmcghee 12 days ago

      Can you say more? Where can I read about some of these alternate use cases?

      • ebri 12 days ago

        yes I want to learn about this too! Hadn't thought about it like that.

    • littlestymaar 11 days ago

      It uses transformers but it's not an llm though.

yeldarb 12 days ago

> While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard A∗ search. Searchformer is an encoder-decoder Transformer model trained to predict the search dynamics of A∗. This model is then fine-tuned via expert iterations to perform fewer search steps than A∗ search while still generating an optimal plan. In our training method, A∗'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. In our ablation studies on maze navigation, we find that Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10× smaller model size and a 10× smaller training dataset. We also demonstrate how Searchformer scales to larger and more complex decision making tasks like Sokoban with improved percentage of solved tasks and shortened search dynamics.

Neat; TIL about Sokoban puzzles. I remember playing Chip's Challenge on Windows 3.1 when I was a kid which had a lot of levels like that.

  • passion__desire 12 days ago

    Jonathan Blow is working on "Sokoban"-Inspired Puzzle Game. I am looking forward to its release.

  • smcin 12 days ago

    Sokoban puzzles have a discrete set of actions, typically 8 (move U/D/L/R ; drag U/D/L/R), on a discrete grid.

    • airstrike 11 days ago

      more like push than drag tbf

nextaccountic 12 days ago

Due to the no free lunch theorem [0], any search algorithm that makes some problems faster will necessarily make other problems slower. How does the worst case for an algorithm like this look like?

I think that part of the appeal of A* to me is that I can readily visualize why the algorithm failed at some pathological inputs.

[0] https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_op...

  • kevindamm 12 days ago

    Well, it's still using A* and state exploration to validate the plans generated. But, it's generating plans nondeterministically so it is possible it could run in O(\inf).

    Why didn't they give more details about the Sokoban puzzles being solved? The standard benchmark is the 90 puzzles in XSokoban and there's a Large Test Suite that has a few thousand but many of them haven't been solved by any search or planning system. Even the 90 puzzles in XSokoban were only solved a few years ago (2020?) by Festival (using FESS, a feature-space search in addition to domain-space, state-space). Before that it had been about 20 years of only 57 or so levels solved via search.

    I see that they measure trace lengths but I would have really liked to see # of XSokoban levels and how many states were examined, to line up with existing research.

  • blamestross 12 days ago

    Well you have to run the AI, which is significantly more costly than just running A*

    Also, the AI has a finite size, which means it has to be scaled up for bigger graphs. I doubt it would work at all for non-euclidian 2d graphs.

    The exciting thing is that they made an AI do a toy problem, not that it was efficient.

    But people are dumb and tech companies have brain control over them. "AI good and solve all problems"

    It reminds me of the "slime mold solves mazes" thing. It was interesting and there were even things to learn from it, but it wasn't magic and couldn't give us better general tools.

  • reaperman 12 days ago

    The "No free lunch" theorem does have some prequisites for it to actually apply.

    > There is no free lunch in search if and only if the distribution on objective functions is invariant under permutation of the space of candidate solutions.

  • IanCal 12 days ago

    NFL theorem is frankly silly.

    It boils down to "you can't predict random numbers".

    I'm sure it's useful to have formalized but it has no real impact on real world situations because most problems we care about have some kind of structure we can exploit.

    • nextaccountic 12 days ago

      Well when I asked "How does the worst case for an algorithm like this look like?" I fully expected the answer to be "in practice this never happens" (that is, there are bad cases but since we aren't drawing inputs at random, in practice we never hit them)

      But how likely is that?

    • woopsn 11 days ago

      Yes significantly. But take for example the stock market, which is obviously of interest, to OAI in particular, but which cannot over the medium-long term be extrapolated accurately from historical data.