xg15 2 days ago

(2021), still very interesting. Especially the "post-overfitting" training strategy is unexpected.

esafak a day ago

The low sample efficiency of RL is well explained.