points by hodgehog11 6 days ago

> We need more of these.

> Available data is 0 for most things.

I would argue that we need an effective alternative to benchmarks entirely given how hard they are to obtain in scientific disciplines. Classical statistics has gone very far by getting a lot out of limited datasets, and train-test splits are absolutely unnecessary there.

bobmarleybiceps 6 days ago

I kind of dislike the benchmarkification of AI for science stuff tbh. I've encountered a LOT of issues with benchmark datasets that just aren't good... In a lot of cases they are fine and necessary, but IMO, the standard for legit "success" in a lot of ML for science applications should basically be "can this model be used to make real scientific or engineering insights, that would have been very difficult and/or impossible without the proposed idea."

Even if this is a super high bar, I think more papers in ML for science should strive to be truly interdisciplinary and include an actual science advancement... Not just "we modify X and get some improvement on a benchmark dataset that may or may not be representative of the problems scientists could actually encounter." The ultimate goal of "ml for science" is science, not really to improve ML methods imo