The author is looking for positions which are difficult for low rated players and easier for high rated players.
Poor man’s version of this which requires no training would be to evaluate positions at low depth and high depth and select positions where the best move switches.
Training neural nets to model behavior at different levels is also possible but high rated players are inherently more difficult to model.
> Poor man’s version of this which requires no training would be to evaluate positions at low depth and high depth and select positions where the best move switches.
I've looked at the ~same problem in go instead of chess quite a bit. It turns out that that strategy does remarkably poorly, mostly because weak players don't just have low-depth thinking, they also do random weird shit and miss entire patterns. Eg in go if you run a strong go engine at _zero_ thinking (just take the model's best guess first move), it's already damn close to pro level.
Getting them to play anything like a beginner, or even intermediate player is functionally impossible. You _may_ be able to trick one into playing as weak as a beginner player if you really try, but it would feel _nothing_ like one.
> Training neural nets to model behavior at different levels is also possible but high rated players are inherently more difficult to model.
Maybe more difficult to model (not sure tbh, but granted for the moment), but it's _far_ easier to generate training data for strong players via reinforcement learning approaches.
> Maybe more difficult to model (not sure tbh, but granted for the moment),
Two reasons. One, strong players are performing tree search, so you need to model their tree search process rather than a simple depth zero prediction. And two, there are far fewer high rated players so there is far less high quality training data. A lot of high rated games are also bullet, which aren’t as useful.
> but it's _far_ easier to generate training data for strong players via reinforcement learning approaches.
In go, I remember that there were rare positions/puzzles where old (pre-mcts) algorithms did better than humans, and that was artificial positions where there were a lot of complex recaptures in the same area. Which is rare in go, but common in chess.
I do think that positions where the best move at depth N is a terrible move at N+1 are hard, especially if it isn't just recaptures on the same square. Chess engines used to have special heuristics to avoid such "horizon effects", don't know if they still do.
I don't even think "high rated players are inherently more difficult to model" is correct, and the opposite is more likely to be correct. Top player, more than 2700 players, tend to play the engine moves the majority of the time, while lower rated players tend to stray from the engine lines.
If you check figure 4, SF at depth 15 matches about 47% of the time which is lower than their NN. So quantitatively, SF isn’t amazing at move matching, and lower than the best NN at its relevant rating range (1900 -> 54%).
Qualitatively, SF is pretty unsatisfying as a model of a pro human because a lot of its moves are counterintuitive.
top players are hard to model just because there aren't many of them. 1200-2200 has tons of players and they all make various degrees of sensible moves most of the time, so you can plausibly train a NN to sample from those distributions. for 2500+ you have far fewer players so the likely moves depend a lot more on the styles and blindspots of specific players.
Seems like an excellent definition, mainly because I have no idea how to measure it otherwise. "Complexity is the time needed to solve a problem", why not.
Well, there are a lot of shortcuts built into engines like stockfish that don't perfectly mirror how beginners vs advanced players think.
For example, certain types of piece forks are easy for newer players to spot (knight forking a king+rook on the back rank) and others are harder (bishop forking a king and a pawn in the late game when nobody is on the back rank), but stockfish is going to find both just as easily.
So if you want to get deep into complexity you do need a model of different players and what types of advantages are easy to see for them.
evaluate positions at low depth and high depth and select positions where the best move switches.
Won’t that bias in favour of common sacrifices in the endgame? The issue with using depth is that humans don’t struggle with depth uniformly. Humans can calculate to a lot more depth with a sequence of obvious moves (recaptures, pawn races) than they can in more complex situations (closed positions in the midgame).
Yeah, this is called the horizon effect: if I take your knight with my queen and stop calculating I’m up a knight, when in reality the queen will be taken on the next turn.
One chess engine I saw long ago used a small depth (maybe even 1!) but with the exception of consecutive captures to avoid the biggest case for this issue (pawn promotion as well iirc)
I had this idea of drilling games against an engine with a set depth evaluation, since beating a depth 1 engine should teach simpler concepts than level 4.
Yes, though in the latest versions it is just a drop-down where one can choose between a few fixed values, and one has to make the move for the engine.
Cute chess is a pretty GUI where one can do this too by setting the ply parameter. Castling is done by dragging the king on the rook!
basically, the state space of the game can produce some intuition on why certain games are hard, and is demonstrated as a clustering of various states that are only linked to the winning state by a very small number of edges. So a player can easily "get lost" in the maze.
That video has become an instant classic to me when it came out, very well made and explained. Very much influencing my thoughts as of late. Related to the article, I've also spent a few months before working on chess line/complexity visualization and explanation as a side project. The main outcome of this was https://www.schachzeit.com/en/openings/sicilian-defense. I haven't touched it beyond keeping it barely running for most of 2025, so a strong breeze would surely knock it over, but the experience of making it did teach me a lot about chess and computerized chess.
The way the table at the bottom of these pages works is by running stockfish on not only the current position, but on the position after every possible legal move the first player could make. This increases the amount of computation needed by one or two orders of magnitude, as stockfish often does not explore every possible legal move so deeply. I was only doing it for lichess' openings lines, and it cost maybe $200 worth of computer processing time (can't recall the exact numbers).
As far as I know (and I'm only someone who has spent a couple dozen hours with stockfish, a stockfish dev would know more) stockfish does not allow you easy access to the hash table that contains all the evaluations. I could not figure out how to get it to print out it's hash tables at all in order to get at the values deeper down in the complexity.
I've considered forking stockfish and trying to get those values to be accessible somehow, in order to try and make visualizations like that video, and maybe someday I will. For now though, I think that's the technical piece missing, because otherwise one is forced to recompute every single node of the graph at the desired depth rather than running it once and getting back the values that have already been computed. Maybe someday, when I have more time, I'll take a shot at it, but not anytime soon I don't think.
Stockfish is clearly the standard nowadays but if you simply want to score all moves to n depth where n is smallish; there used to be a lot of chess engines around, and coding one’s own was often a project people would take on. Not so much now it seems.
it builds as a docker image which has stockfish and maia (maiachess.com) together with different weights so it can simulate lower-level players.
It was a fun exercise, I tried a bunch of local models with this MCP server, which isn't particularly optimized, but also doesn't seem that bad. And the results were quite disappointing, they often would invent chess related reasoning and mess up answering questions, even if you'd expect them to rely on the tools and have true evaluation available.
It was also fun to say things: fetch a random game by username 'X' from lichess, analyze it and find positions which are good puzzles for a player rated N.
and see it figure out the algorithm of tool calls:
- fetch the game
- feed the moves to stockfish
- find moves where evaluation changed sharply
- feed it to maia at strength around N and to stockfish
- if these disagree, it's probably a good puzzle.
I don't think I got to have a working setup like that even with managed cloud models. Various small issues, like timeouts on the MCP calls, general unreliability, etc. Then lost interest and abandoned the idea.
It is shockingly difficult to use LLMs in chess study. I don't need it to be a better (or worse) Stockfish; an LLM should be great at taking a FEN or multiple lines from Stockfish via MCP or tool call and explain why positions are evaluated the way they are, typical plans in the position (drawing from pretraining knowledge of a vast archive of games), and how to explain to a human to study these positions.
I suspect that the large amount of chess pretraining data is not well synchronized with positions, because in books and articles the text is typically accompanied by pictures of the positions, NOT FENs / PGNs. So the training on the text is decoupled from the representation of the position.
Regarding your tool call thing with stockfish/maia, I made a tool like this for myself called Blunder Sniper which iteratively feeds positions - that I'm likely to get given my openings - and recursively calls the lichess DB and finds the first time in the top 80% played moves in each chain where the opponents in the rating range blunder as the most common move in the chain.
It was a fun way to use an alternative to engine-based preparation that many strong players use, which is something like Nibbler + lc0 and using contempt high values to find higher variance lines rather than game theory optimal ones.
Some day I'll expand on the gpt-chess articles [0] that I found super interesting, fine-tune models... well, I keep telling myself that, anyway...
I've been quite taken with Go these last few months and I wish the western world had more skilled players. The Asian servers do not do much (if any) i18n, they don't need to, they aren't many players outside Asia.
I've been fortunate that one of my high school friends is a 4 dan EGF player and that he has taken lots of time to play with me and teach me. But I can see other beginners struggling with the basics because they're only playing other low-skilled players.
I wish the western world paid Go more attention, it's a beautiful game with a really nice balance.
Some things that may be of interest. First relevant to the posted article:
A site that has used neural nets to classify go moves that good players would probably make that weaker players (of varying ranks) would probably not: https://neuralnetgoproblems.com/ (code available on github)
https://ai-sensei.com/challenge (behind login wall, and in future possibly a pay wall) is a similar idea, but the difficulty of evaluating the position is determined by how users of the site perform in practice.
And more generally, but relevant to your comment:
Players can play humans at an appropriate rank on OGS https://online-go.com/ (not as popular as the Eastern servers, but probably popular enough) -- or against calibrated rank "human-like" AI players by painfully setting up the right katago models themselves, or by paying for a subscription on ai-sensei.com
A go education site that's currently largely by and pitched at Westerners: https://gomagic.org/ for leveling up from the basics.
And a lot of books are now available easily and electronically in English (and some in German): https://gobooks.com/ --- I'd recommend "graded go problems for beginners", "tesuji", and "attack and defense".
Some good sites aren't (fully) available in English, like https://www.101weiqi.com/ -- but there are chrome and firefox extensions to translate just enough of it to make it usable.
[To help search engines: go is also known as weiqi and baduk]
If there is only one possible move then arguably the complexity of the position is zero (not counting when there are no more possible moves, which means the game is over).
But complexity can't be measured only by the number of possible moves. If there is a mate in 1, but lots of possible moves, then complexity of the position is also very low for a player with any familiarity with the game, and potentially, moderately high for someone who has never played any game (but knows the rules).
Yet it should be possible to measure "absolute complexity", ie, complexity that doesn't depend on the experience of the player.
So, complexity is a factor of the size of the tree, and the minimal number of moves between the current position and winning. (Because chess is entirely deterministic, that distance always exists. We are (currently) unable to measure it in most cases, but we can be sure it does exist.)
It should be possible to estimate the size of the tree heuristically, even without enumerating all possible moves. But then I'm not sure where to go from there...
> "If there is a mate in 1, but lots of possible moves, then complexity of the position is also very low for a player with any familiarity with the game"
This is one reason I think the concept of complexity will probably never be formalized. Because even when a position is mate in 1, the complexity varies wildly. For instance here [1] is a game between Carlsen and Nakamura where Carlsen, perhaps the strongest player of all time, failed to see a mate in 1, and for that matter Nakamura, not exactly a weaker player himself, help-mated himself by playing one of the only moves that allowed Carlsen a mate in 1! And both players still had tons of time on their clock.
But to understand how they both missed it you need to go back in the position and see that both saw black's queen was defending against the mate, and Nakamura's blunder interfered with that protection in a rather unusual way. So you can't even assess it on a positional basis, since it's dependent upon what was played before, let alone all of the poorly defined terms I'm using like 'rather unusual.'
I agree that complexity can vary widely even with mate in 1. Lots of mate in 1 chess puzzles to prove that.
But in the case of this game, I wouldn’t say that they missed the mate in 1 due to the complexity of the position or the previous moves. They both just had the same blind spot, happens every so often.
Yeah but in this position Qh7# was literally the main idea in the position. It wasn't like some surprising idea. But I think it's logically explainable. When you're playing the game, each move you're considering what changed with your opponent's move, and after black plays h6 your mind naturally considers the pawn rather than the square. So it's easier to miss that h7 has become weaker.
If Magnus, or probably any player over ~2300, saw this position in a vacuum, they'd all literally instantly see Qh7#. It's basically impossible to miss, except for this weird nuance!
> ...between the current position and winning. (Because chess is entirely deterministic, that distance always exists. We are (currently) unable to measure it in most cases, but we can be sure it does exist.)
Can we?
Just because it's deterministic doesn't mean it always leads to a win - there may be strategies that always lead to a draw, we may just not know about them. Tic-tac-toe is deterministic and we know it always leads to a draw if played correctly - we know about it because it's a much simpler game (much smaller tree of possibilities).
Indeed, in Stockfish, a 100-centipawn advantage for a player is calibrated to mean "a 50% chance of a win (as opposed to a draw or a loss) from the current position" [0].
In endgame tablebases, the distinction can get interesting, since one player may in principle have a forced mate, but it would take so many moves that the opponent could force a draw by the 50-move rule.
This reminds me of a little project [1] I had fun with to try and compute the sharpness of a position and/or evaluate whole lines. Indeed, the notion of sharpness (which I believe it’s different from complexity) although easy to intuit I’ve never found a satisfactory implementation.
As a player I like to think of sharpness as a measure of the potential consequences of a miscalculation. In a main line dragon, the consequence is often getting checkmated in the near future, so maximally sharp. In a quiet positional struggle, the consequence might be something as minor as the opponent getting a strong knight, or ending up with a weak pawn.
Whereas complexity is a measure of how far ahead I can reasonably expect to calculate. This is something non-players often misunderstand, which is why they like to ask me how many moves ahead I can see. It depends on the position.
And I agree, these concepts are orthogonal. Positions can be sharp, complex, both or neither. A pawn endgame is typically very sharp; the slightest mistake can lead to the opponent queening and checkmating. But it's relativity low in complexity because you can calculate far ahead using ideas like counting, geometric patterns(square of the pawn, zone of the pawn, distant opposition etc) to abstract over long lines of play. On the opposite side, something like a main line closed Ruy Lopez is very complex(every piece still on the board), but not especially sharp(closed position, both kings are safe, it's more of a struggle for slight positional edges).
Something like a king's indian or benoni will be both sharp and complex. Whereas an equal rook endgame is neither(it's quite hard to lose a rook endgame, there always seems to be a way to save a draw).
There is quite specific machining to difficult positions in chess that should be mentioned. A difficult position occurs when at the surface it seems that there are many viable moves to be analyzed. Think of being a kid and thinking which pawn to move first. Then you learn opening theory and it gets a little easier.
But you can still encounter a position where you could take multiple pieces, all leading to different trades or some forces moves. Another scenario is when you're in a closed position and there are many, many micro adjustments you could make. An ai might know that you should move the A pawn and you might not even be considering it.
Basically, positions where GothamChess starts speaking with a Russian accent to channel the Russian grandmasters.
Amethyst-Cat apparently erased their github account, but the internet archive still has one snapshot [1]. The pdf link from there redirected to the davidtpeng account, but
that github user no longer has a ChessComplexity repo and the internet archive only has a broken snapshot [2].
I always thought that there's some stuff that looks hard in chess, but is actually not so difficult. Things like sacrifice lines. Those feel complicated, but the calculation eventually becomes pretty deterministic.
The things that don't look hard to beginners (because there's no immediate danger) are usually questions around activity and positional advantages. Should you get rid of your opponent's central knight, but give them your long-range bishop? Should you take with the pawn in the center and explode the position or bring the rook behind it first?
Those are the truly hard things in chess. Maybe this changes if you're a high-level player, but I'm not.
> Those are the truly hard things in chess. Maybe this changes if you're a high-level player, but I'm not.
They remain pretty difficult. Ben Finegold said it best when doing a lecture on middlegame technique; a novice complained that it's hard to figure out what to do in a positional battle and where to put all the pieces: "I got bad news for you - that's hard for us, too."
Almost all amateur chess games are decided by a mistake at six ply or less. So to emulate lower rated players just occasionally flip the evaluation after a depth of 4-6 ply.
Wayback spotted some of it, but after bouncing through its 302 capture one ends up here, but yes, the actual repo is still gone and the "in page" snapshot doesn't render the actual pdf:
Also, after some url surgery it seems Wayback did grab the .tar.gz of a release <https://web.archive.org/web/20201226113237/https://codeload....> but sadly it does not include the pdf, only some .png and a helpfully named "Software/blank", and also an "asset" of ChessOracle1.2.zip containing the .h5 weights but no pdf to be seen
> Here, most bots would play a very strong 11. c7d5 NxPch. Ha! Why drag things out? Instead, I played 11. e3e4 P-K4, guessing that black---with only two minutes left on his clock---might grab the king pawn, and suffer an immediate checkmate. Black did not. I was so enraged that I shouted "110111001110111101000", which I admit was unsportsmanlike conduct.
We should be aiming to solve chess, but we are not even trying.
If the complexity of chess is finite this is possible.
Chess AI have become so good that maybe there is no more progress to be made.
We must abandon the concept of "valuation".
Here is the criterium for solving chess.
A chess engine is a constant (or bounded) time, (and memory bounded) function f which takes any position and return either 1 "white win", 0 "draw", -1 "white lose".
A chess engine solve chess if we can't find any violation in the Bellman equation :
f(gamestate) = max over legal moves of ( -f(gamestate+move) ) if there are legal moves
or f(gamestate) = result if there are no legal moves
This function f can be encoded either by a neural network or some code, but it must be computable in constant (or bounded) time.
The whole problem of solving chess is now simplified to mining the gamestate space for counter examples.
Like in math you can conjecture that your chess engine has solved the game and it stands until someone else find a violation of your game engine (either it doesn't compute in bounded time, or the bellman equation is invalid).
To verify a chess engine you can just sample a random gamestate, or a known to be difficult gamestate and check if the equation holds, that's at most max number of legal moves + 1, function evaluation.
You can try applying this concept to simple games like tic-tac-toe, or trying to compress endgame table with a neural network.
We can find such a candidate function by training a neural network by minimizing the current expected number of violation in the bellman equation over various dataset of gamestate. Once we "grok" it to zero we are done. To soften the problem you can train an auxilliary continuous function g which output the probability of 1, 0, or -1 (with a softmax) and add a discount factor like in Reinforcement Learning, but the final result is the discrete argmax of it (like in "deterministic policy gradient") with no discount factor.
Once you have this finite time oracle, playing chess is just wandering along the graph of drawable position to push your adversary into a position where his engine will have a violation of the bellman equation aka a mis-evaluation of the position where if he goes into it you will get into the winnable positions where you stay until the end. (Not all violations can be exploited as the adversary can sometime avoid going into "uncertain" positions).
A simpler (less strong) chess strategy may avoid going into "uncertain" positions as long as the position is unreachable because previous legal positions have a preferred choice. (4 state logic with win, draw, lose, uncertain). Or ordered list of moves. They make the problem easier but complexify the corresponding bellman equation.
So the game becomes once again exploring the space of "drawable" game positions in order to find positions which the adversary can't escape going into a uncertain (and losing) position. Aka playing the adversary and not the board which is an harder problem except if the adversary can play perfectly. Aka playing for the win.
This comment and your other comments are simply wrong and full of nonsense. Endgame table generators are pure solvers ... given enough time, they can solve chess from the initial position. But the amount of time is longer than the time until the heat death of the universe, and to record the best-play game tree--without which a solution isn't useful--would take more material than there is in the universe.
>This comment and your other comments are simply wrong and full of nonsense
That's because you are not understanding me.
My conviction is the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play.
Neural network evaluations are now used as heuristic to evaluate the position and even without tree searching they play very well, but they are usually very small and complemented by a few high level features as input. A bigger network, well fine-tuned can probably reach perfect play without the need of tree searching.
A thought exercise is trying to compress endgame table with a neural network and see how big we need it to be in order to reach perfect play. The thing is : you don't need to train it on all games from the endgame table before it converges to perfect play.
You can know how close you are to the optimal, by counting the number of Bellman equation violation (or not observing them)
You can even train it by having it referencing the previously trained oracle of endgame table chess. You solve chess with a neural network when there are only 2 pieces. Then you solve chess for 3 pieces eventually using the chess for 2 pieces oracle. Then you solve chess for 4 pieces using the chess for 3 pieces oracle, and so on ... until you reach chess for 32 pieces.
Adding pieces up only increase complexity up to a point.
>the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play.
That is certainly not the mainstream view, so you will have to support your conjecture by some evidence if you would like to convince people that you are right (or demonstrate it empirically end-to-end).
My conviction is based on the interesting phenomenon of game tree collapsing.
When you get better at the game, the number of "candidate" moves you need to explore goes down. For example when you are naive at chess you need to explore all legal moves, and then all legal moves response to these moves. The number of nodes in this tree grows exponentially, and the tree depth is rather limited to a small depth.
But when your list of "candidate" move is reduced to size 1. The number of nodes in the tree is equal to its depth. You can "unroll the line very cheaply". (That's the concept of lines in chess. In go that's the concept of "ladders").
When your candidate list is of size 2, the game tree is of size 2^(depth+1).
You can have fractional candidate list size, by only needing to explore down the tree some times, or having the need to explore only 2 candidate some times.
The complexity grows as (Fractional Candidate List Size) ^ (depth + 1 ). With FCLS being between 1 and 2. There is a transition phase which occurs, and allows to go much deeper in the game tree which simplify things and make things tractable (because you then reach endgame tables or an equivalence class number (aka you don't necessarily know the result yet you just know it will be the same result as all these other types of games) ).
So your function is allowed to have a smaller inner function called recursively to explore the game tree of candidate moves within a computational budget, to help make a decision. The upper function resolve the equivalence number to their "win" "draw" "lose" value always in the same way and takes the appropriate maximum (remember your network only goal is to be always consistent).
The better your network get at managing his computational budget the deeper it can go. You can think of it as a version of alpha-beta pruning with improving heuristics.
I do understand you, but you and your "conviction" are wrong. Apparently you aren't even familiar with AlphaZero.
> Adding pieces up only increase complexity up to a point.
As you said, until you reach 32 pieces. What you vastly underestimate is how much complexity is added at each level. You're like the king in the fable who agreed to give the vizier a small amount of grain: 1 grain for the first square, 2 grains for the second square, 4 grains for the third square, etc. The king thought he was getting a bargain.
> The thing is : you don't need to train it on all games from the endgame table before it converges to perfect play.
But you do, because there is no algorithmic simplification, at all. Strong chess players understand that while there are common patterns throughout chess, their application is highly specific to the position. That's why we have endgame tables, which are used to solve positions that pattern matching doesn't solve. You can get excellent play out of an NN, but that's not the same as solving it. And the absence of Bellman violations is necessary, but not sufficient ... you can't use it to prove that you've solved chess. The fact is that it is impossible within pragmatic limits to prove that chess has been solved. But so what? Programs like AlphaZero and Stockfish already play well enough for any purpose.
Anyway, you're free to go implement this ... good luck. I won't respond further.
>the absence of Bellman violations is necessary, but not sufficient
It is sufficient though. All chess game ends in a finite number of moves. If you are consistent aka you have zero violation. Thinking backward (dynamic programming), you can "color" correctly final positions. And you can color correctly all position at 1 turn before end because you are consistent. Then 2 turn before ends,... Then recursively you have correctly colored all chess positions.
You are missing the complexity collapse which can occur in games, like for example the game of Nim, where a simple function can predict the outcome. When you have 15 sticks and you can remove any 3, naively one would think that there are 2 ^ 15 game states and "15 choose 3" legal game moves by turn, but in fact there are equivalence classes, which mean the game state is reduced to 15 different states only.
Modulo the prism of a trained neural network, game states got grouped into equivalence classes and the same phenomenon occur in chess, which allow simplification by high level rules like white wins because white wins the pawn race.
I am more familiar with Stockfish than AlphaZero or LeelaChess0, but the Stockfish demonstrates that well engineered features can bring down the neural network size a lot. In particular counting usually poses problem to neural networks, and counting like how many moves before the 50 moves rule or number of moves before a pawn race are edge cases that can be simplified (DTZ, and DTM).
Also these engines are trying to compress the evaluation function which is a lot more information than just whether the position is win, draw or loss, aka just the frontier.
> It is sufficient though. All chess game ends in a finite number of moves.
Again, the issue is the size of that number, not that it is finite.
> You are missing the complexity collapse which can occur in games, like for example the game of Nim
All you have is ad hominems ... "you don't understand", "you are missing" ...
I'm not missing the intellectual dishonesty of obviously irrelevant examples like Nim, Rubik's cube, or cryptanalysis that have clear mathematical regularities completely lacking from chess.
Now stop insulting me and my intelligence. Again, if you want to go implement this, or write a journal paper, then have at it. But this "we" should have solved chess already, and "we" should be aiming to solve chess, but "we" are not even trying is arrogant trolling.
> My conviction is the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play
Given the number of possible games (numbers above 10^80) you would need EXTREME sparsity to encode it in less than 10B / 10^10 params. Sounds information theoretically impossible to me ¯\_(ツ)_/¯
Leela Chess Zero has hundreds of millions to few billion parameters AFAIK.
The argument about the game being finite and thus solvable is misguided IMO.
AES encryption is also finite, and you can enumerate all possible combinations, but not before the end of time..
> We should be aiming to solve chess, but we are not even trying.
We know exactly how to solve chess, we have known for decades. The function f is called min-max, and can be optimized with things like alpha pruning. Given that chess is a bounded game, this is a bounded time and bounded space algorithm. The definition of `f` that you gave can actually be quite directly encoded in Haskell and executed (though it will miss some obvious optimizations).
The problem is that this algorithm seems to be close to optimal and it would still take some few thousand years of computation time to actually run it to solve chess (or was it decades, or millions of years? not really that relevant).
Now, of course, no one has actually proved that this is the optimal algorithm, so for all we know, there exists a much simpler `f` that could take milliseconds on a pocket calculator. But this seems unlikely given the nature of the problem, and either way, it's just not that interesting for most people to put the kind of deep mathematical research into it that it would take.
Solving chess is not really a very interesting problem as pure mathematics. The whole interest was in beating human players with human-like strategies, which has thoroughly been achieved. The most interesting thing that remains is challenging humans that like chess at their own level of gameplay - since ultimately the only true purpose of chess is to entertain humans, and a machine that plays perfectly is actually completely unfun to play.
To be fair, if you take your agument from the last paragraph, i.e. that the function of chess as the game is to entertain, your earlier argument re: min-max doesn't really stand, does it? I think, you're right that chess probably quite interesting in terms of abstract maths, like surely there are ways to represent the pawns (pawn structures?) as well as the pieces (knights, bishops, etc.) in terms of some supersymmetry. However, it doesn't seem like much progress has been made in this area academically since the 20th century. It may be helpful to tap into AlphaFold and related results for interpretability! Stockfish has incorporated some probabilistic programming (neural network-based) but it's comparatively small-scaled, and behind SOTA of the bleeding-edge Transformer architectures (in terms of interpretability, nonetheless!) Surely, if we can't get supersymmetries in some complex forms, we could get ahead with the modern interpretability and RL techniques. Given the appropriate knowledge representation, by combining self-play with known playing sequences and behaviours by forcing the model into known lines, & perhaps partitioning by player styles so there's incentives for the model to learn some style feature, it should be possible for it to learn what we refer to as the essence of the game, i.e. archetypal human playing styles comfortably. Using insights learned from interpretability, it should be possible to further influence the model during inference.
If they were to get to that point, we could say that chess would be solved...
> I think, you're right that chess probably quite interesting in terms of abstract maths
They said it's not interesting.
> like surely there are ways to represent the pawns (pawn structures?) as well as the pieces (knights, bishops, etc.) in terms of some supersymmetry.
No, absolutely not. The confusion between pawns and pawn structures highlights how completely off base this is. The attack vectors of the various pieces are easily represented, but there's no generalization to "knight structures" or "bishop structures", and "supersymmetry" is completely irrelevant to chess.
> If they were to get to that point, we could say that chess would be solved...
No, solving a game has a specific meaning and that's not it.
I wouldn't discount symmetries in chess on the account of algebraic topology existing.
Once in a while, new maths is produced in one area, and somebody else would pick it up to apply in a completely different domain. I'm a bit familiar with chess engines, & their source code. The board representation is just cache-friendly data structures of a 8x8 board, and heuristics on top. This is only a statement on our understanding: heuristic, or probabilistic (see AlphaZero), but it doesn't mean a more fundamental, symmetrical structure doesn't exist. Rubik's cube was famously solved by fundamental insights from group theory. Now, chess is probably, but not definitely, radically harder problem in terms of computational complexity, let alone because there's an adversary and all of game logic applies. However, we see it all the time in cryptanalysis where new insights from maths people broke some very cool constructions out-of not much but some bad priors.
Pure min-max search is inherently suboptimal, if your goal is understanding the game. AlphaZero, and to a lesser extent Leela et al. has shown this, and indeed the players incorporated all these ideas shortly thereafter. Of course, old tricks no longer provide advantage now that they're known, but then again—it doesn't mean better interpretations in commentary, player training, etc. are not possible. None of the existing engines, heuristic, probabilistic, heuristic and probabilistic, are so far (a) bringing new maths to help apply new maths to chess positions, (b) bringing new game-representations that would lend better to interpretability during inference.
To truly solve chess, in a given engine both (a) and (b) must suffice.
To get +EV from some intricate line analysed as deep as reasonable in the allotted time is not to bring you any closer in understanding the game. You could tell that the line works, of course, but that would only be possible on account of you already having found the line in the first place! However, for the line to be possible, and profitable, something in the structure of your prior position has to allow it. Or otherwise predict it.
I didn't ... the word used was "supersymmetry". And likening chess to a Rubik's cube or cryptanalysis is just silly ... silly enough that I'm going to stop commenting. But:
> None of the existing engines, heuristic, probabilistic, heuristic and probabilistic, are so far (a) bringing new maths to help apply new maths to chess positions, (b) bringing new game-representations that would lend better to interpretability during inference.
Sigh. The people developing chess engines are not idiots, and are strongly motivated to find the most effective algorithms.
> We should have solved chess already.
> We should be aiming to solve chess, but we are not even trying.
We are trying and we should not expect to solve it because the search space is massive. There are more legal games of chess than atoms in the universe.
True, but we can at the very least teach a computer to prune what are effectively unreachable positions or openings that won't happen at a basic level let alone high level or GM level play.
For example, I don't think I've ever seen or will ever see:
1. a3 a6
2. h3 h6
The computer should be given the logic in which to solve chess by telling it right off the bat to not search through certain lines or openings because they are effectively useless.
You think no one has done that? Aside from explicitly encoding book lines, NN weights developed over millions of games steer chess engines away from useless lines.
But if you don't explore a line, even if you believe it's effectively useless, then you haven't solved chess.
P.S.
> I don't think I've ever seen or will ever see: 1. a3 a6 2. h3 h6
I have. Children and other beginners often do play like this. And even at GM levels we've seen nonsense like the Bongcloud and, marginally less nonsensically, the Cow opening.
>But if you don't explore a line, even if you believe it's effectively useless, then you haven't solved chess.
That's actually a good point because ideas that modern GMs found to be useless (in particular, locking a wing down via aggressive early wing play) have actually found a new home in AlphaZero's play.
What might be completely dumb from an early move (say, aggressively pushing the F pawn from the start) might provide to be an incredible opening, but humans are too stupid to be able to execute it perfectly.
This is the precisely the point I am trying to make :
A candidate solution function can take a very small finite space : much smaller than the number of gamestates.
And we can invalidate a candidate solution by finding a single counter example.
Current chess engine can't be invalidated quickly : they are not trying to solve chess. They are not falsifiable. They are not attempting to precisely define the frontier which is what I think where remaining efforts should be made.
We are just trying to encode the frontier like we would with a mandelbrot fractal.
Proving that a solution is valid is harder than finding a solution. Here I am suggesting we find the solution first.
Proving can also be done without exploring all legal chess positions. For example you can use branch and bound to cull vast amount of state space once you have your function. You just have to partition the state space and prove on each partition that the function is constant, for example when one side has 8 queen and the other has only a king with freedom of movement the position is winnable and you don't have to check every gamestate.
I am stating that there is a good chance that by searching for a function we will stumble upon one which has no counter example, because precisely the complexity of chess is not infinite (unproven conjecture (no-turing completeness of adversarial chess) ). We should be looking for it.
Who is currently trying? You make it sound like people think it is impossible and hence would not try.
>There are more legal games of chess than atoms in the universe.
This only matters if you are doing a pure brute force. Also comparing exponential numbers to a count is an unfair comparison where exponents easily win.
> If the complexity of chess is finite this is possible.
This is like saying the distance to Mars is finite so it should be possible to just go and live there. It's not theoretically impossible, but at the time it is practically impossible. There are real engineering challenges between here and there that have not yet been solved and from the looks of it, it will be at least several more decades before we get there.
In your example, you gloss over the construction of a "finite time oracle" by just saying "just train it until it is perfect". If humanity knew how to do that we could solve a whole other (more interesting) set of problems, but we don't.
Yes that's exactly the problem with current approach based on a "valuation" function.
They are not trying to aim for perfection, and therefore cannot make progress anymore.
To progress you must precisely define what is the frontier : an evaluation of 0.1 is not resolved to one of "white win", "draw", "white lose" which they theoretically must be. They are not "committing" to anything.
To train such a network to perfection you must avoid training your neural network for the "average" game state, but rather also train for "hard mining samples", game states which define the frontier.
Find a candidate, find a violation, add to dataset of training examples, Retrain to perfection on a growing dataset, (or a generator of hard positions) to find a new candidate and Loop.
So what makes you think it is possible to precisely define such a frontier? And why should such a thing, if it is possible at all, be 1. doable by humans and 2. doable with the amount of energy and computing power available to us within the coming couple of decades?
I've got to ask, do you play much chess? Because this post reads like you don't understand much about chess.
The issue with "solving" chess isn't that there isn't an objectively best move in every position. The issue is that calculating that move is functionally impossible for most positions. That's because chess gets exponentially more complicated the more pieces there are on the board. For example, there are around 26 million positions with 5 or fewer pieces, and over 3.7 billion positions with 6 or fewer pieces.
And most of those positions are distinct. This isn't like a Rubik's cube, where there are a lot of functionally identical positions. Any experienced chess player can tell you that a single piece placement can be the difference between a winning position, and a losing position.
And this complexity is what I love about chess! I love the fact that I can enter positions that no one has ever encountered before just by playing some chess online. I love the fact that deep analysis is possible, but that the sheer size of the possibility space means we can never truly solve chess. Chess strikes a perfect balance of complexity. Any simpler, and evaluating the best move would be too easy. Any more complicated, and evaluation becomes so difficult that it's hardly worth trying.
Which isn't to say that we can't build computers that are very good at chess. A person hasn't beaten a top computer in decades. Magnus Carlson is probably the greatest chess player to have ever lived, and you can run software on your phone that could easily beat him. But there's a wide gulf between "can beat every human alive" and "plays objectively perfect chess."
The author is looking for positions which are difficult for low rated players and easier for high rated players.
Poor man’s version of this which requires no training would be to evaluate positions at low depth and high depth and select positions where the best move switches.
Training neural nets to model behavior at different levels is also possible but high rated players are inherently more difficult to model.
> Poor man’s version of this which requires no training would be to evaluate positions at low depth and high depth and select positions where the best move switches.
I've looked at the ~same problem in go instead of chess quite a bit. It turns out that that strategy does remarkably poorly, mostly because weak players don't just have low-depth thinking, they also do random weird shit and miss entire patterns. Eg in go if you run a strong go engine at _zero_ thinking (just take the model's best guess first move), it's already damn close to pro level.
Getting them to play anything like a beginner, or even intermediate player is functionally impossible. You _may_ be able to trick one into playing as weak as a beginner player if you really try, but it would feel _nothing_ like one.
> Training neural nets to model behavior at different levels is also possible but high rated players are inherently more difficult to model.
Maybe more difficult to model (not sure tbh, but granted for the moment), but it's _far_ easier to generate training data for strong players via reinforcement learning approaches.
> Maybe more difficult to model (not sure tbh, but granted for the moment),
Two reasons. One, strong players are performing tree search, so you need to model their tree search process rather than a simple depth zero prediction. And two, there are far fewer high rated players so there is far less high quality training data. A lot of high rated games are also bullet, which aren’t as useful.
> but it's _far_ easier to generate training data for strong players via reinforcement learning approaches.
Can you say more about this?
In go, I remember that there were rare positions/puzzles where old (pre-mcts) algorithms did better than humans, and that was artificial positions where there were a lot of complex recaptures in the same area. Which is rare in go, but common in chess.
I do think that positions where the best move at depth N is a terrible move at N+1 are hard, especially if it isn't just recaptures on the same square. Chess engines used to have special heuristics to avoid such "horizon effects", don't know if they still do.
> high rated players are inherently more difficult to model
yes and no. there's a bit of a bathtub curve here where below ~1000 elo lower ratings are harder to predict because their moves are closer to random.
I don't even think "high rated players are inherently more difficult to model" is correct, and the opposite is more likely to be correct. Top player, more than 2700 players, tend to play the engine moves the majority of the time, while lower rated players tend to stray from the engine lines.
https://arxiv.org/pdf/2006.01855
If you check figure 4, SF at depth 15 matches about 47% of the time which is lower than their NN. So quantitatively, SF isn’t amazing at move matching, and lower than the best NN at its relevant rating range (1900 -> 54%).
Qualitatively, SF is pretty unsatisfying as a model of a pro human because a lot of its moves are counterintuitive.
top players are hard to model just because there aren't many of them. 1200-2200 has tons of players and they all make various degrees of sensible moves most of the time, so you can plausibly train a NN to sample from those distributions. for 2500+ you have far fewer players so the likely moves depend a lot more on the styles and blindspots of specific players.
Seems like an excellent definition, mainly because I have no idea how to measure it otherwise. "Complexity is the time needed to solve a problem", why not.
Well, there are a lot of shortcuts built into engines like stockfish that don't perfectly mirror how beginners vs advanced players think.
For example, certain types of piece forks are easy for newer players to spot (knight forking a king+rook on the back rank) and others are harder (bishop forking a king and a pawn in the late game when nobody is on the back rank), but stockfish is going to find both just as easily.
So if you want to get deep into complexity you do need a model of different players and what types of advantages are easy to see for them.
It could be done with e.g. the Maia engine, which was trained on Lichess games and plays like humans on different skill levels – https://www.maiachess.com/ – but I do not know if it supports different ponder depths. Edit: it seems it does not – https://github.com/CSSLab/maia-chess#:~:text=a%20nodes%20lim...
evaluate positions at low depth and high depth and select positions where the best move switches.
Won’t that bias in favour of common sacrifices in the endgame? The issue with using depth is that humans don’t struggle with depth uniformly. Humans can calculate to a lot more depth with a sequence of obvious moves (recaptures, pawn races) than they can in more complex situations (closed positions in the midgame).
Yeah, this is called the horizon effect: if I take your knight with my queen and stop calculating I’m up a knight, when in reality the queen will be taken on the next turn.
One chess engine I saw long ago used a small depth (maybe even 1!) but with the exception of consecutive captures to avoid the biggest case for this issue (pawn promotion as well iirc)
Seems to me that if it is common, it is better known, so the AI knows it better, so it is less komplex.
I had this idea of drilling games against an engine with a set depth evaluation, since beating a depth 1 engine should teach simpler concepts than level 4.
I vibe coded this into a browser app, but the evaluation is slow around depth 5: https://camjohnson26.github.io/chess-trainer/
This is a thing you can do with Stockfish + SCID, is it not?
Yes, though in the latest versions it is just a drop-down where one can choose between a few fixed values, and one has to make the move for the engine.
Cute chess is a pretty GUI where one can do this too by setting the ply parameter. Castling is done by dragging the king on the rook!
https://github.com/cutechess/cutechess
This reminds me of this nice video : https://www.youtube.com/watch?v=YGLNyHd2w10
basically, the state space of the game can produce some intuition on why certain games are hard, and is demonstrated as a clustering of various states that are only linked to the winning state by a very small number of edges. So a player can easily "get lost" in the maze.
That video has become an instant classic to me when it came out, very well made and explained. Very much influencing my thoughts as of late. Related to the article, I've also spent a few months before working on chess line/complexity visualization and explanation as a side project. The main outcome of this was https://www.schachzeit.com/en/openings/sicilian-defense. I haven't touched it beyond keeping it barely running for most of 2025, so a strong breeze would surely knock it over, but the experience of making it did teach me a lot about chess and computerized chess.
The way the table at the bottom of these pages works is by running stockfish on not only the current position, but on the position after every possible legal move the first player could make. This increases the amount of computation needed by one or two orders of magnitude, as stockfish often does not explore every possible legal move so deeply. I was only doing it for lichess' openings lines, and it cost maybe $200 worth of computer processing time (can't recall the exact numbers).
As far as I know (and I'm only someone who has spent a couple dozen hours with stockfish, a stockfish dev would know more) stockfish does not allow you easy access to the hash table that contains all the evaluations. I could not figure out how to get it to print out it's hash tables at all in order to get at the values deeper down in the complexity.
I've considered forking stockfish and trying to get those values to be accessible somehow, in order to try and make visualizations like that video, and maybe someday I will. For now though, I think that's the technical piece missing, because otherwise one is forced to recompute every single node of the graph at the desired depth rather than running it once and getting back the values that have already been computed. Maybe someday, when I have more time, I'll take a shot at it, but not anytime soon I don't think.
Stockfish is clearly the standard nowadays but if you simply want to score all moves to n depth where n is smallish; there used to be a lot of chess engines around, and coding one’s own was often a project people would take on. Not so much now it seems.
I recently was trying to build an AI assistant to help with various chess things, which took shape of an MCP server: https://github.com/shelajev/mcp-stockfish
it builds as a docker image which has stockfish and maia (maiachess.com) together with different weights so it can simulate lower-level players.
It was a fun exercise, I tried a bunch of local models with this MCP server, which isn't particularly optimized, but also doesn't seem that bad. And the results were quite disappointing, they often would invent chess related reasoning and mess up answering questions, even if you'd expect them to rely on the tools and have true evaluation available.
It was also fun to say things: fetch a random game by username 'X' from lichess, analyze it and find positions which are good puzzles for a player rated N.
and see it figure out the algorithm of tool calls: - fetch the game - feed the moves to stockfish - find moves where evaluation changed sharply - feed it to maia at strength around N and to stockfish - if these disagree, it's probably a good puzzle.
I don't think I got to have a working setup like that even with managed cloud models. Various small issues, like timeouts on the MCP calls, general unreliability, etc. Then lost interest and abandoned the idea.
I should try again after seeing this thread
It is shockingly difficult to use LLMs in chess study. I don't need it to be a better (or worse) Stockfish; an LLM should be great at taking a FEN or multiple lines from Stockfish via MCP or tool call and explain why positions are evaluated the way they are, typical plans in the position (drawing from pretraining knowledge of a vast archive of games), and how to explain to a human to study these positions.
I suspect that the large amount of chess pretraining data is not well synchronized with positions, because in books and articles the text is typically accompanied by pictures of the positions, NOT FENs / PGNs. So the training on the text is decoupled from the representation of the position.
Regarding your tool call thing with stockfish/maia, I made a tool like this for myself called Blunder Sniper which iteratively feeds positions - that I'm likely to get given my openings - and recursively calls the lichess DB and finds the first time in the top 80% played moves in each chain where the opponents in the rating range blunder as the most common move in the chain.
It was a fun way to use an alternative to engine-based preparation that many strong players use, which is something like Nibbler + lc0 and using contempt high values to find higher variance lines rather than game theory optimal ones.
Some day I'll expand on the gpt-chess articles [0] that I found super interesting, fine-tune models... well, I keep telling myself that, anyway...
[0]: https://dynomight.net/more-chess/
I've been quite taken with Go these last few months and I wish the western world had more skilled players. The Asian servers do not do much (if any) i18n, they don't need to, they aren't many players outside Asia.
I've been fortunate that one of my high school friends is a 4 dan EGF player and that he has taken lots of time to play with me and teach me. But I can see other beginners struggling with the basics because they're only playing other low-skilled players.
I wish the western world paid Go more attention, it's a beautiful game with a really nice balance.
Some things that may be of interest. First relevant to the posted article:
A site that has used neural nets to classify go moves that good players would probably make that weaker players (of varying ranks) would probably not: https://neuralnetgoproblems.com/ (code available on github)
https://ai-sensei.com/challenge (behind login wall, and in future possibly a pay wall) is a similar idea, but the difficulty of evaluating the position is determined by how users of the site perform in practice.
And more generally, but relevant to your comment:
Players can play humans at an appropriate rank on OGS https://online-go.com/ (not as popular as the Eastern servers, but probably popular enough) -- or against calibrated rank "human-like" AI players by painfully setting up the right katago models themselves, or by paying for a subscription on ai-sensei.com
A go education site that's currently largely by and pitched at Westerners: https://gomagic.org/ for leveling up from the basics.
And a lot of books are now available easily and electronically in English (and some in German): https://gobooks.com/ --- I'd recommend "graded go problems for beginners", "tesuji", and "attack and defense".
Some good sites aren't (fully) available in English, like https://www.101weiqi.com/ -- but there are chrome and firefox extensions to translate just enough of it to make it usable.
[To help search engines: go is also known as weiqi and baduk]
Thanks for the first link it's really fun game :)
I've been playing lots on OGS (with my friend in particular), under my pseudonym erelde
> What is complexity in chess?
If there is only one possible move then arguably the complexity of the position is zero (not counting when there are no more possible moves, which means the game is over).
But complexity can't be measured only by the number of possible moves. If there is a mate in 1, but lots of possible moves, then complexity of the position is also very low for a player with any familiarity with the game, and potentially, moderately high for someone who has never played any game (but knows the rules).
Yet it should be possible to measure "absolute complexity", ie, complexity that doesn't depend on the experience of the player.
So, complexity is a factor of the size of the tree, and the minimal number of moves between the current position and winning. (Because chess is entirely deterministic, that distance always exists. We are (currently) unable to measure it in most cases, but we can be sure it does exist.)
It should be possible to estimate the size of the tree heuristically, even without enumerating all possible moves. But then I'm not sure where to go from there...
> "If there is a mate in 1, but lots of possible moves, then complexity of the position is also very low for a player with any familiarity with the game"
This is one reason I think the concept of complexity will probably never be formalized. Because even when a position is mate in 1, the complexity varies wildly. For instance here [1] is a game between Carlsen and Nakamura where Carlsen, perhaps the strongest player of all time, failed to see a mate in 1, and for that matter Nakamura, not exactly a weaker player himself, help-mated himself by playing one of the only moves that allowed Carlsen a mate in 1! And both players still had tons of time on their clock.
But to understand how they both missed it you need to go back in the position and see that both saw black's queen was defending against the mate, and Nakamura's blunder interfered with that protection in a rather unusual way. So you can't even assess it on a positional basis, since it's dependent upon what was played before, let alone all of the poorly defined terms I'm using like 'rather unusual.'
[1] - https://www.chess.com/events/2024-titled-tuesday-blitz-may-7...
I agree that complexity can vary widely even with mate in 1. Lots of mate in 1 chess puzzles to prove that.
But in the case of this game, I wouldn’t say that they missed the mate in 1 due to the complexity of the position or the previous moves. They both just had the same blind spot, happens every so often.
Yeah but in this position Qh7# was literally the main idea in the position. It wasn't like some surprising idea. But I think it's logically explainable. When you're playing the game, each move you're considering what changed with your opponent's move, and after black plays h6 your mind naturally considers the pawn rather than the square. So it's easier to miss that h7 has become weaker.
If Magnus, or probably any player over ~2300, saw this position in a vacuum, they'd all literally instantly see Qh7#. It's basically impossible to miss, except for this weird nuance!
> ...between the current position and winning. (Because chess is entirely deterministic, that distance always exists. We are (currently) unable to measure it in most cases, but we can be sure it does exist.)
Can we?
Just because it's deterministic doesn't mean it always leads to a win - there may be strategies that always lead to a draw, we may just not know about them. Tic-tac-toe is deterministic and we know it always leads to a draw if played correctly - we know about it because it's a much simpler game (much smaller tree of possibilities).
Indeed, in Stockfish, a 100-centipawn advantage for a player is calibrated to mean "a 50% chance of a win (as opposed to a draw or a loss) from the current position" [0].
In endgame tablebases, the distinction can get interesting, since one player may in principle have a forced mate, but it would take so many moves that the opponent could force a draw by the 50-move rule.
[0] https://official-stockfish.github.io/docs/stockfish-wiki/Sto...
This reminds me of a little project [1] I had fun with to try and compute the sharpness of a position and/or evaluate whole lines. Indeed, the notion of sharpness (which I believe it’s different from complexity) although easy to intuit I’ve never found a satisfactory implementation.
[1] https://github.com/ghyatzo/stockfish-line-sharpness
As a player I like to think of sharpness as a measure of the potential consequences of a miscalculation. In a main line dragon, the consequence is often getting checkmated in the near future, so maximally sharp. In a quiet positional struggle, the consequence might be something as minor as the opponent getting a strong knight, or ending up with a weak pawn.
Whereas complexity is a measure of how far ahead I can reasonably expect to calculate. This is something non-players often misunderstand, which is why they like to ask me how many moves ahead I can see. It depends on the position.
And I agree, these concepts are orthogonal. Positions can be sharp, complex, both or neither. A pawn endgame is typically very sharp; the slightest mistake can lead to the opponent queening and checkmating. But it's relativity low in complexity because you can calculate far ahead using ideas like counting, geometric patterns(square of the pawn, zone of the pawn, distant opposition etc) to abstract over long lines of play. On the opposite side, something like a main line closed Ruy Lopez is very complex(every piece still on the board), but not especially sharp(closed position, both kings are safe, it's more of a struggle for slight positional edges).
Something like a king's indian or benoni will be both sharp and complex. Whereas an equal rook endgame is neither(it's quite hard to lose a rook endgame, there always seems to be a way to save a draw).
> Whereas complexity is a measure of how far ahead I can reasonably expect to calculate
So complexity is something like: high branching factor after culling “obvious” branches?
isn't sharpness just the opposite of quiescence ?
https://www.chessprogramming.org/Quiescence_Search
There is quite specific machining to difficult positions in chess that should be mentioned. A difficult position occurs when at the surface it seems that there are many viable moves to be analyzed. Think of being a kid and thinking which pawn to move first. Then you learn opening theory and it gets a little easier.
But you can still encounter a position where you could take multiple pieces, all leading to different trades or some forces moves. Another scenario is when you're in a closed position and there are many, many micro adjustments you could make. An ai might know that you should move the A pawn and you might not even be considering it.
Basically, positions where GothamChess starts speaking with a Russian accent to channel the Russian grandmasters.
Interested to read the code, but the link provided 404s
https://github.com/Amethyst-Cat/ChessComplexity/blob/master/...
Amethyst-Cat apparently erased their github account, but the internet archive still has one snapshot [1]. The pdf link from there redirected to the davidtpeng account, but that github user no longer has a ChessComplexity repo and the internet archive only has a broken snapshot [2].
[1] https://web.archive.org/web/20201226113203/https://github.co...
[2] https://web.archive.org/web/20240316015417/https://github.co...
I always thought that there's some stuff that looks hard in chess, but is actually not so difficult. Things like sacrifice lines. Those feel complicated, but the calculation eventually becomes pretty deterministic.
The things that don't look hard to beginners (because there's no immediate danger) are usually questions around activity and positional advantages. Should you get rid of your opponent's central knight, but give them your long-range bishop? Should you take with the pawn in the center and explode the position or bring the rook behind it first?
Those are the truly hard things in chess. Maybe this changes if you're a high-level player, but I'm not.
> Those are the truly hard things in chess. Maybe this changes if you're a high-level player, but I'm not.
They remain pretty difficult. Ben Finegold said it best when doing a lecture on middlegame technique; a novice complained that it's hard to figure out what to do in a positional battle and where to put all the pieces: "I got bad news for you - that's hard for us, too."
Almost all amateur chess games are decided by a mistake at six ply or less. So to emulate lower rated players just occasionally flip the evaluation after a depth of 4-6 ply.
The "algorithm in the full proposal here" pointing to:
seems gone.Wayback spotted some of it, but after bouncing through its 302 capture one ends up here, but yes, the actual repo is still gone and the "in page" snapshot doesn't render the actual pdf:
https://web.archive.org/web/20220917133741/https://github.co...
Also, after some url surgery it seems Wayback did grab the .tar.gz of a release <https://web.archive.org/web/20201226113237/https://codeload....> but sadly it does not include the pdf, only some .png and a helpfully named "Software/blank", and also an "asset" of ChessOracle1.2.zip containing the .h5 weights but no pdf to be seen
Check this out, it's quite a funny chess engine. Yes, funny!
Good moves against Stockfish.
https://boristrapsky.com/
> Here, most bots would play a very strong 11. c7d5 NxPch. Ha! Why drag things out? Instead, I played 11. e3e4 P-K4, guessing that black---with only two minutes left on his clock---might grab the king pawn, and suffer an immediate checkmate. Black did not. I was so enraged that I shouted "110111001110111101000", which I admit was unsportsmanlike conduct.
I was disappointed in the article, as I was primed for some 'complexity theory' type discussion, e.g. emergence,
We should have solved chess already.
We should be aiming to solve chess, but we are not even trying.
If the complexity of chess is finite this is possible.
Chess AI have become so good that maybe there is no more progress to be made.
We must abandon the concept of "valuation".
Here is the criterium for solving chess.
A chess engine is a constant (or bounded) time, (and memory bounded) function f which takes any position and return either 1 "white win", 0 "draw", -1 "white lose".
A chess engine solve chess if we can't find any violation in the Bellman equation :
f(gamestate) = max over legal moves of ( -f(gamestate+move) ) if there are legal moves or f(gamestate) = result if there are no legal moves
This function f can be encoded either by a neural network or some code, but it must be computable in constant (or bounded) time.
The whole problem of solving chess is now simplified to mining the gamestate space for counter examples.
Like in math you can conjecture that your chess engine has solved the game and it stands until someone else find a violation of your game engine (either it doesn't compute in bounded time, or the bellman equation is invalid).
To verify a chess engine you can just sample a random gamestate, or a known to be difficult gamestate and check if the equation holds, that's at most max number of legal moves + 1, function evaluation.
You can try applying this concept to simple games like tic-tac-toe, or trying to compress endgame table with a neural network.
We can find such a candidate function by training a neural network by minimizing the current expected number of violation in the bellman equation over various dataset of gamestate. Once we "grok" it to zero we are done. To soften the problem you can train an auxilliary continuous function g which output the probability of 1, 0, or -1 (with a softmax) and add a discount factor like in Reinforcement Learning, but the final result is the discrete argmax of it (like in "deterministic policy gradient") with no discount factor.
Once you have this finite time oracle, playing chess is just wandering along the graph of drawable position to push your adversary into a position where his engine will have a violation of the bellman equation aka a mis-evaluation of the position where if he goes into it you will get into the winnable positions where you stay until the end. (Not all violations can be exploited as the adversary can sometime avoid going into "uncertain" positions).
A simpler (less strong) chess strategy may avoid going into "uncertain" positions as long as the position is unreachable because previous legal positions have a preferred choice. (4 state logic with win, draw, lose, uncertain). Or ordered list of moves. They make the problem easier but complexify the corresponding bellman equation.
So the game becomes once again exploring the space of "drawable" game positions in order to find positions which the adversary can't escape going into a uncertain (and losing) position. Aka playing the adversary and not the board which is an harder problem except if the adversary can play perfectly. Aka playing for the win.
This comment and your other comments are simply wrong and full of nonsense. Endgame table generators are pure solvers ... given enough time, they can solve chess from the initial position. But the amount of time is longer than the time until the heat death of the universe, and to record the best-play game tree--without which a solution isn't useful--would take more material than there is in the universe.
>This comment and your other comments are simply wrong and full of nonsense
That's because you are not understanding me.
My conviction is the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play.
Neural network evaluations are now used as heuristic to evaluate the position and even without tree searching they play very well, but they are usually very small and complemented by a few high level features as input. A bigger network, well fine-tuned can probably reach perfect play without the need of tree searching.
A thought exercise is trying to compress endgame table with a neural network and see how big we need it to be in order to reach perfect play. The thing is : you don't need to train it on all games from the endgame table before it converges to perfect play.
You can know how close you are to the optimal, by counting the number of Bellman equation violation (or not observing them)
You can even train it by having it referencing the previously trained oracle of endgame table chess. You solve chess with a neural network when there are only 2 pieces. Then you solve chess for 3 pieces eventually using the chess for 2 pieces oracle. Then you solve chess for 4 pieces using the chess for 3 pieces oracle, and so on ... until you reach chess for 32 pieces.
Adding pieces up only increase complexity up to a point.
>the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play.
That is certainly not the mainstream view, so you will have to support your conjecture by some evidence if you would like to convince people that you are right (or demonstrate it empirically end-to-end).
My conviction is based on the interesting phenomenon of game tree collapsing.
When you get better at the game, the number of "candidate" moves you need to explore goes down. For example when you are naive at chess you need to explore all legal moves, and then all legal moves response to these moves. The number of nodes in this tree grows exponentially, and the tree depth is rather limited to a small depth.
But when your list of "candidate" move is reduced to size 1. The number of nodes in the tree is equal to its depth. You can "unroll the line very cheaply". (That's the concept of lines in chess. In go that's the concept of "ladders").
When your candidate list is of size 2, the game tree is of size 2^(depth+1).
You can have fractional candidate list size, by only needing to explore down the tree some times, or having the need to explore only 2 candidate some times.
The complexity grows as (Fractional Candidate List Size) ^ (depth + 1 ). With FCLS being between 1 and 2. There is a transition phase which occurs, and allows to go much deeper in the game tree which simplify things and make things tractable (because you then reach endgame tables or an equivalence class number (aka you don't necessarily know the result yet you just know it will be the same result as all these other types of games) ).
So your function is allowed to have a smaller inner function called recursively to explore the game tree of candidate moves within a computational budget, to help make a decision. The upper function resolve the equivalence number to their "win" "draw" "lose" value always in the same way and takes the appropriate maximum (remember your network only goal is to be always consistent).
The better your network get at managing his computational budget the deeper it can go. You can think of it as a version of alpha-beta pruning with improving heuristics.
But ... but ... it's his conviction.
I do understand you, but you and your "conviction" are wrong. Apparently you aren't even familiar with AlphaZero.
> Adding pieces up only increase complexity up to a point.
As you said, until you reach 32 pieces. What you vastly underestimate is how much complexity is added at each level. You're like the king in the fable who agreed to give the vizier a small amount of grain: 1 grain for the first square, 2 grains for the second square, 4 grains for the third square, etc. The king thought he was getting a bargain.
> The thing is : you don't need to train it on all games from the endgame table before it converges to perfect play.
But you do, because there is no algorithmic simplification, at all. Strong chess players understand that while there are common patterns throughout chess, their application is highly specific to the position. That's why we have endgame tables, which are used to solve positions that pattern matching doesn't solve. You can get excellent play out of an NN, but that's not the same as solving it. And the absence of Bellman violations is necessary, but not sufficient ... you can't use it to prove that you've solved chess. The fact is that it is impossible within pragmatic limits to prove that chess has been solved. But so what? Programs like AlphaZero and Stockfish already play well enough for any purpose.
Anyway, you're free to go implement this ... good luck. I won't respond further.
>the absence of Bellman violations is necessary, but not sufficient
It is sufficient though. All chess game ends in a finite number of moves. If you are consistent aka you have zero violation. Thinking backward (dynamic programming), you can "color" correctly final positions. And you can color correctly all position at 1 turn before end because you are consistent. Then 2 turn before ends,... Then recursively you have correctly colored all chess positions.
You are missing the complexity collapse which can occur in games, like for example the game of Nim, where a simple function can predict the outcome. When you have 15 sticks and you can remove any 3, naively one would think that there are 2 ^ 15 game states and "15 choose 3" legal game moves by turn, but in fact there are equivalence classes, which mean the game state is reduced to 15 different states only.
Modulo the prism of a trained neural network, game states got grouped into equivalence classes and the same phenomenon occur in chess, which allow simplification by high level rules like white wins because white wins the pawn race.
I am more familiar with Stockfish than AlphaZero or LeelaChess0, but the Stockfish demonstrates that well engineered features can bring down the neural network size a lot. In particular counting usually poses problem to neural networks, and counting like how many moves before the 50 moves rule or number of moves before a pawn race are edge cases that can be simplified (DTZ, and DTM).
Also these engines are trying to compress the evaluation function which is a lot more information than just whether the position is win, draw or loss, aka just the frontier.
> It is sufficient though. All chess game ends in a finite number of moves.
Again, the issue is the size of that number, not that it is finite.
> You are missing the complexity collapse which can occur in games, like for example the game of Nim
All you have is ad hominems ... "you don't understand", "you are missing" ...
I'm not missing the intellectual dishonesty of obviously irrelevant examples like Nim, Rubik's cube, or cryptanalysis that have clear mathematical regularities completely lacking from chess.
Now stop insulting me and my intelligence. Again, if you want to go implement this, or write a journal paper, then have at it. But this "we" should have solved chess already, and "we" should be aiming to solve chess, but "we" are not even trying is arrogant trolling.
Over and out.
> My conviction is the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play
Given the number of possible games (numbers above 10^80) you would need EXTREME sparsity to encode it in less than 10B / 10^10 params. Sounds information theoretically impossible to me ¯\_(ツ)_/¯
Leela Chess Zero has hundreds of millions to few billion parameters AFAIK.
The argument about the game being finite and thus solvable is misguided IMO.
AES encryption is also finite, and you can enumerate all possible combinations, but not before the end of time..
> We should be aiming to solve chess, but we are not even trying.
We know exactly how to solve chess, we have known for decades. The function f is called min-max, and can be optimized with things like alpha pruning. Given that chess is a bounded game, this is a bounded time and bounded space algorithm. The definition of `f` that you gave can actually be quite directly encoded in Haskell and executed (though it will miss some obvious optimizations).
The problem is that this algorithm seems to be close to optimal and it would still take some few thousand years of computation time to actually run it to solve chess (or was it decades, or millions of years? not really that relevant).
Now, of course, no one has actually proved that this is the optimal algorithm, so for all we know, there exists a much simpler `f` that could take milliseconds on a pocket calculator. But this seems unlikely given the nature of the problem, and either way, it's just not that interesting for most people to put the kind of deep mathematical research into it that it would take.
Solving chess is not really a very interesting problem as pure mathematics. The whole interest was in beating human players with human-like strategies, which has thoroughly been achieved. The most interesting thing that remains is challenging humans that like chess at their own level of gameplay - since ultimately the only true purpose of chess is to entertain humans, and a machine that plays perfectly is actually completely unfun to play.
To be fair, if you take your agument from the last paragraph, i.e. that the function of chess as the game is to entertain, your earlier argument re: min-max doesn't really stand, does it? I think, you're right that chess probably quite interesting in terms of abstract maths, like surely there are ways to represent the pawns (pawn structures?) as well as the pieces (knights, bishops, etc.) in terms of some supersymmetry. However, it doesn't seem like much progress has been made in this area academically since the 20th century. It may be helpful to tap into AlphaFold and related results for interpretability! Stockfish has incorporated some probabilistic programming (neural network-based) but it's comparatively small-scaled, and behind SOTA of the bleeding-edge Transformer architectures (in terms of interpretability, nonetheless!) Surely, if we can't get supersymmetries in some complex forms, we could get ahead with the modern interpretability and RL techniques. Given the appropriate knowledge representation, by combining self-play with known playing sequences and behaviours by forcing the model into known lines, & perhaps partitioning by player styles so there's incentives for the model to learn some style feature, it should be possible for it to learn what we refer to as the essence of the game, i.e. archetypal human playing styles comfortably. Using insights learned from interpretability, it should be possible to further influence the model during inference.
If they were to get to that point, we could say that chess would be solved...
> I think, you're right that chess probably quite interesting in terms of abstract maths
They said it's not interesting.
> like surely there are ways to represent the pawns (pawn structures?) as well as the pieces (knights, bishops, etc.) in terms of some supersymmetry.
No, absolutely not. The confusion between pawns and pawn structures highlights how completely off base this is. The attack vectors of the various pieces are easily represented, but there's no generalization to "knight structures" or "bishop structures", and "supersymmetry" is completely irrelevant to chess.
> If they were to get to that point, we could say that chess would be solved...
No, solving a game has a specific meaning and that's not it.
I wouldn't discount symmetries in chess on the account of algebraic topology existing.
Once in a while, new maths is produced in one area, and somebody else would pick it up to apply in a completely different domain. I'm a bit familiar with chess engines, & their source code. The board representation is just cache-friendly data structures of a 8x8 board, and heuristics on top. This is only a statement on our understanding: heuristic, or probabilistic (see AlphaZero), but it doesn't mean a more fundamental, symmetrical structure doesn't exist. Rubik's cube was famously solved by fundamental insights from group theory. Now, chess is probably, but not definitely, radically harder problem in terms of computational complexity, let alone because there's an adversary and all of game logic applies. However, we see it all the time in cryptanalysis where new insights from maths people broke some very cool constructions out-of not much but some bad priors.
Pure min-max search is inherently suboptimal, if your goal is understanding the game. AlphaZero, and to a lesser extent Leela et al. has shown this, and indeed the players incorporated all these ideas shortly thereafter. Of course, old tricks no longer provide advantage now that they're known, but then again—it doesn't mean better interpretations in commentary, player training, etc. are not possible. None of the existing engines, heuristic, probabilistic, heuristic and probabilistic, are so far (a) bringing new maths to help apply new maths to chess positions, (b) bringing new game-representations that would lend better to interpretability during inference.
To truly solve chess, in a given engine both (a) and (b) must suffice.
To get +EV from some intricate line analysed as deep as reasonable in the allotted time is not to bring you any closer in understanding the game. You could tell that the line works, of course, but that would only be possible on account of you already having found the line in the first place! However, for the line to be possible, and profitable, something in the structure of your prior position has to allow it. Or otherwise predict it.
> I wouldn't discount symmetries in chess
I didn't ... the word used was "supersymmetry". And likening chess to a Rubik's cube or cryptanalysis is just silly ... silly enough that I'm going to stop commenting. But:
> None of the existing engines, heuristic, probabilistic, heuristic and probabilistic, are so far (a) bringing new maths to help apply new maths to chess positions, (b) bringing new game-representations that would lend better to interpretability during inference.
Sigh. The people developing chess engines are not idiots, and are strongly motivated to find the most effective algorithms.
> We should have solved chess already. > We should be aiming to solve chess, but we are not even trying.
We are trying and we should not expect to solve it because the search space is massive. There are more legal games of chess than atoms in the universe.
> There are more legal games of chess than atoms in the universe
I’ll make an even stronger assertion:
There are more legal chess games in 40 moves or less, than there are atoms in the universe.
https://en.wikipedia.org/wiki/Shannon_number
https://skeptics.stackexchange.com/questions/4165/are-there-...
True, but we can at the very least teach a computer to prune what are effectively unreachable positions or openings that won't happen at a basic level let alone high level or GM level play.
For example, I don't think I've ever seen or will ever see: 1. a3 a6 2. h3 h6
The computer should be given the logic in which to solve chess by telling it right off the bat to not search through certain lines or openings because they are effectively useless.
You think no one has done that? Aside from explicitly encoding book lines, NN weights developed over millions of games steer chess engines away from useless lines.
But if you don't explore a line, even if you believe it's effectively useless, then you haven't solved chess.
P.S.
> I don't think I've ever seen or will ever see: 1. a3 a6 2. h3 h6
I have. Children and other beginners often do play like this. And even at GM levels we've seen nonsense like the Bongcloud and, marginally less nonsensically, the Cow opening.
>But if you don't explore a line, even if you believe it's effectively useless, then you haven't solved chess.
That's actually a good point because ideas that modern GMs found to be useless (in particular, locking a wing down via aggressive early wing play) have actually found a new home in AlphaZero's play.
What might be completely dumb from an early move (say, aggressively pushing the F pawn from the start) might provide to be an incredible opening, but humans are too stupid to be able to execute it perfectly.
This is the precisely the point I am trying to make :
A candidate solution function can take a very small finite space : much smaller than the number of gamestates. And we can invalidate a candidate solution by finding a single counter example.
Current chess engine can't be invalidated quickly : they are not trying to solve chess. They are not falsifiable. They are not attempting to precisely define the frontier which is what I think where remaining efforts should be made.
We are just trying to encode the frontier like we would with a mandelbrot fractal.
Proving that a solution is valid is harder than finding a solution. Here I am suggesting we find the solution first.
Proving can also be done without exploring all legal chess positions. For example you can use branch and bound to cull vast amount of state space once you have your function. You just have to partition the state space and prove on each partition that the function is constant, for example when one side has 8 queen and the other has only a king with freedom of movement the position is winnable and you don't have to check every gamestate.
I am stating that there is a good chance that by searching for a function we will stumble upon one which has no counter example, because precisely the complexity of chess is not infinite (unproven conjecture (no-turing completeness of adversarial chess) ). We should be looking for it.
>We are trying
Who is currently trying? You make it sound like people think it is impossible and hence would not try.
>There are more legal games of chess than atoms in the universe.
This only matters if you are doing a pure brute force. Also comparing exponential numbers to a count is an unfair comparison where exponents easily win.
> If the complexity of chess is finite this is possible.
This is like saying the distance to Mars is finite so it should be possible to just go and live there. It's not theoretically impossible, but at the time it is practically impossible. There are real engineering challenges between here and there that have not yet been solved and from the looks of it, it will be at least several more decades before we get there.
In your example, you gloss over the construction of a "finite time oracle" by just saying "just train it until it is perfect". If humanity knew how to do that we could solve a whole other (more interesting) set of problems, but we don't.
>"just train it until it is perfect"
Yes that's exactly the problem with current approach based on a "valuation" function.
They are not trying to aim for perfection, and therefore cannot make progress anymore.
To progress you must precisely define what is the frontier : an evaluation of 0.1 is not resolved to one of "white win", "draw", "white lose" which they theoretically must be. They are not "committing" to anything.
To train such a network to perfection you must avoid training your neural network for the "average" game state, but rather also train for "hard mining samples", game states which define the frontier.
Find a candidate, find a violation, add to dataset of training examples, Retrain to perfection on a growing dataset, (or a generator of hard positions) to find a new candidate and Loop.
So what makes you think it is possible to precisely define such a frontier? And why should such a thing, if it is possible at all, be 1. doable by humans and 2. doable with the amount of energy and computing power available to us within the coming couple of decades?
"We should have solved Busy Beaver, if the complexity of BB(n) is finite this is possible."
I mean yeah, chess isn't THAT bad but still not directly tractable and besides, brute forcing it is boring.
I've got to ask, do you play much chess? Because this post reads like you don't understand much about chess.
The issue with "solving" chess isn't that there isn't an objectively best move in every position. The issue is that calculating that move is functionally impossible for most positions. That's because chess gets exponentially more complicated the more pieces there are on the board. For example, there are around 26 million positions with 5 or fewer pieces, and over 3.7 billion positions with 6 or fewer pieces.
And most of those positions are distinct. This isn't like a Rubik's cube, where there are a lot of functionally identical positions. Any experienced chess player can tell you that a single piece placement can be the difference between a winning position, and a losing position.
And this complexity is what I love about chess! I love the fact that I can enter positions that no one has ever encountered before just by playing some chess online. I love the fact that deep analysis is possible, but that the sheer size of the possibility space means we can never truly solve chess. Chess strikes a perfect balance of complexity. Any simpler, and evaluating the best move would be too easy. Any more complicated, and evaluation becomes so difficult that it's hardly worth trying.
Which isn't to say that we can't build computers that are very good at chess. A person hasn't beaten a top computer in decades. Magnus Carlson is probably the greatest chess player to have ever lived, and you can run software on your phone that could easily beat him. But there's a wide gulf between "can beat every human alive" and "plays objectively perfect chess."