Who discovered grokking and why is the name hard to find?
Apologies if this is old news to everyone, but perhaps the hive mind knows the answer. I was watching a youtube video "The most complex model we actually understand" by Welch Labs and heard the story about the researcher who left a model training when going on vacation, which then learned to generalize after thousands of training steps. But when I try to look up the name of the discoverer it has not been made public, which seems a shabby way to treat someone. What's the real story?
https://arxiv.org/abs/2201.02177 This paper is not hard to find; it's the first result when you search for "grokking" with https://scholar.google.com
Yes I did find that paper, I did not find which one of the 5 authors it was, or someone not listed as an author. The word 'vacation' is not in the paper. https://arxiv.org/pdf/2201.02177.