Programming a chess program was an important challenge for the "first generation of hackers" working on the mainframe machines like TX-0 and PDP-1 in '50 and '60, as described in "Hackers: Heroes of the Computer Revolution" by Steven Levy. I highly recommend this book, I think a lot of people here can recognize their own passion and interests in the stories described there.
Not even 1024 bytes. There were 1024 bytes available to the computer, but some of that was required to store what was displayed on the screen. Exactly how much depended on how much of the screen was used(!).
It's amazing how Shannon contributed to almost everything important to understanding computers at all levels. He contributed to information theory (thus communication, error correction and compression), digital circuits (Master's thesis), Artificial Intelligence (not exhaustive: this paper, Theseus), cryptography and even circuit complexity!
As well known as he is for information, sometimes I think the breadth of his contributions to the foundations of computing are not fully appreciated.
You say AI, but even more precisely: I believe that his ideas form the core basis of modern LLMs actually (their probabilistic word sequence modeling underpinning), which is indeed extremely impressive and deep.
It wasn't so much that I was interested in chess, but the book really opened my eyes to how to write computer programs. For example, a 64 bit vector was used for the board, and for each piece type for each location another 64 bit vector was used to specify the legal moves.
I know all this sounds routine today, but I had only the most rudimentary programming skills at the time, and this stuff was fascinating.
I finally gave up on playing chess because instead of thinking about my next move, I'd think about writing a program to calculate the next move. Hence I played pretty badly.
So far as I can tell, the A+B strategy described therein is exactly the principle that Stockfish -- and more or less every other chess engine -- was built on prior to AlphaZero, after which the top engines moved to NN derived evaluation criteria for what is still a pruning tree search.
AlphaZero/LC0 uses a very different kind of search, enough, which is not based on pruning to the same extent. (I guess you must count LC0 as a top engine?)
Stockfish uses a neural net optimized for CPU called NNUE for static evaluation. They have not used a heuristic evaluator for some time now. In fact it is removed from the code base.
Leela uses PUCT (Predictor + Upper Confidence Bound tree search). We evaluate new nodes by doing a playout: start from the root node (the current position), pick a move to explore, and repeat down the tree until we reach a game position that has not been examined yet (or a position that ends the game, called a terminal node). We expand the tree with that new position (assuming non-terminal node) and use the neural network to create a first estimate of the value for the position as well as the policy for continuing moves. In Leela, a policy for a node is a list of moves and a probability for each move. The probability specifies the odds that an automatic player that executes the policy will make that move. After this node is added to the tree, backup that new value to all nodes visited during this playout. This slowly improves the value estimation of different paths through the game tree.
When a move is actually played on the board, the chosen move is made the new root of the tree. The old root and the other children of that root node are erased.
This is the same search specified by the AGZ paper, PUCT (Predictor + Upper Confidence Bound tree search). Many people call this MCTS (Monte-Carlo Tree Search), because it is very similar to the search algorithm the Go programs started using in 2006. But the PUCT used in AGZ and Lc0 replaces rollouts (sampling playouts to a terminal game state) with a neural network that estimates what a rollout would do.
Reading these very old papers is always quite entertaining: you can feel that they use words like e.g. "routine", or "procedure" not yet as technical terms but as ordinary (if infrequently-used) English words.
I actually learn a lot from old papers. For me it is easier to really understand when I read the original papers, because a very smart person explains why they had the idea.
Like the einstein papers of 1905, or Shannon's paper on entropy or codds paper on relational databases. For me it was way more understandable than any secondary literature I read before.
From the article: "We will be chiefly concerned with the middle game and will not
consider the end game at all."
The effect of having the checkmate rule instead of capturing the king is allowing the possibility of stalemates. But stalemate is rare in the middle game, where there are usually many legal moves available, so it's simpler to pretend the goal is just to capture the king.
It is however important to note that allowing kings to be taken can result in both kings being traded which by the scoring function would be considered neutral. This possibility does not occur in normal games so a search with that modification may generate the wrong value.
Games would need to also be stopped when a king is taken to make the search approximately correct.
Remember that this for evaluating future states of the board. A state of the board where the opposing king is taken (or technically just checkmated), yielding a win is highly desirable. In this case K' is zero even though this can't happen in a middle of the game. By having such an evaluation function the computer is encouraged to make moves that reach this state.
Programming a chess program was an important challenge for the "first generation of hackers" working on the mainframe machines like TX-0 and PDP-1 in '50 and '60, as described in "Hackers: Heroes of the Computer Revolution" by Steven Levy. I highly recommend this book, I think a lot of people here can recognize their own passion and interests in the stories described there.
Also, there's interesting implementation for the ZX-81, created over 20 years later and fitting in just 1024 bytes: https://users.ox.ac.uk/~uzdm0006/scans/1kchess/
Not even 1024 bytes. There were 1024 bytes available to the computer, but some of that was required to store what was displayed on the screen. Exactly how much depended on how much of the screen was used(!).
Different times...
Here's a PDF with the original publication: https://ia601600.us.archive.org/32/items/philosophical-magaz...
It's amazing how Shannon contributed to almost everything important to understanding computers at all levels. He contributed to information theory (thus communication, error correction and compression), digital circuits (Master's thesis), Artificial Intelligence (not exhaustive: this paper, Theseus), cryptography and even circuit complexity!
As well known as he is for information, sometimes I think the breadth of his contributions to the foundations of computing are not fully appreciated.
You say AI, but even more precisely: I believe that his ideas form the core basis of modern LLMs actually (their probabilistic word sequence modeling underpinning), which is indeed extremely impressive and deep.
For more on how modern engines work (although the core idea of alpha-beta pruning [1] is still used) you can check out the chess programming wiki [2].
1: https://www.chessprogramming.org/Alpha-Beta 2: https://www.chessprogramming.org/Main_Page
I read "Chess Skill in Man and Machine" when I was in college.
https://www.amazon.com/Chess-machine-monographs-computer-sci...
It wasn't so much that I was interested in chess, but the book really opened my eyes to how to write computer programs. For example, a 64 bit vector was used for the board, and for each piece type for each location another 64 bit vector was used to specify the legal moves.
I know all this sounds routine today, but I had only the most rudimentary programming skills at the time, and this stuff was fascinating.
I finally gave up on playing chess because instead of thinking about my next move, I'd think about writing a program to calculate the next move. Hence I played pretty badly.
So far as I can tell, the A+B strategy described therein is exactly the principle that Stockfish -- and more or less every other chess engine -- was built on prior to AlphaZero, after which the top engines moved to NN derived evaluation criteria for what is still a pruning tree search.
AlphaZero/LC0 uses a very different kind of search, enough, which is not based on pruning to the same extent. (I guess you must count LC0 as a top engine?)
Stockfish uses a neural net optimized for CPU called NNUE for static evaluation. They have not used a heuristic evaluator for some time now. In fact it is removed from the code base.
I have no idea why you bring that up? NNUE has no bearing on the search.
You might be thinking about how the evaluation is trained (MCTS) rather than how it’s applied in a chess game?
from https://lczero.org/dev/wiki/technical-explanation-of-leela-c...:
What do you mean by A+B?
Alpha-beta pruning I believe. https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning
A little past discussion, a long time ago:
Programming a Computer for Playing Chess (1949) - https://news.ycombinator.com/item?id=9231010 - March 2015 (1 comment)
Programming a Computer for Playing Chess (1950) [pdf] - https://news.ycombinator.com/item?id=9107788 - Feb 2015 (6 comments)
It's incredible how Shannon's papers read like they were written this decade.
Thats why I always come back to HN. You don't excpect anything and find s.th. golden. Compare that to the rest of social media.
Reading these very old papers is always quite entertaining: you can feel that they use words like e.g. "routine", or "procedure" not yet as technical terms but as ordinary (if infrequently-used) English words.
I actually learn a lot from old papers. For me it is easier to really understand when I read the original papers, because a very smart person explains why they had the idea.
Like the einstein papers of 1905, or Shannon's paper on entropy or codds paper on relational databases. For me it was way more understandable than any secondary literature I read before.
Spaniard here. the same. Rutina, subrutina, procedimiento... the same Latin/Romance terms.
Now lots of these aren't used anymore but in papers.
Fun fact: Barbara Liskov's PhD 18 years later developed the "killer heuristic", a minimax optimisation useful in, amongst other things, playing chess.
I don’t understand his evaluation function. Why do we need the 200x(K-K’) term? It should always be zero.
From the article: "We will be chiefly concerned with the middle game and will not consider the end game at all."
The effect of having the checkmate rule instead of capturing the king is allowing the possibility of stalemates. But stalemate is rare in the middle game, where there are usually many legal moves available, so it's simpler to pretend the goal is just to capture the king.
It is however important to note that allowing kings to be taken can result in both kings being traded which by the scoring function would be considered neutral. This possibility does not occur in normal games so a search with that modification may generate the wrong value.
Games would need to also be stopped when a king is taken to make the search approximately correct.
Remember that this for evaluating future states of the board. A state of the board where the opposing king is taken (or technically just checkmated), yielding a win is highly desirable. In this case K' is zero even though this can't happen in a middle of the game. By having such an evaluation function the computer is encouraged to make moves that reach this state.
How many homemade chess engines did this paper give rise to... X-D Shannon we miss you