| Svelte Hacker News

points by leereeves a year ago

> Training your neural net only fiddles with the parameters like a and b. It doesn't do anything about the shape of the function. It doesn't change sine into multiplication etc.

It definitely can. The output will always be piecewise linear (with ReLU), but the overall shape can change completely.

ziofill a year ago

You can fit any data with enough parameters. What’s tricky is to constrain a model so that it approximates the ground truth well where there are no data points. If a family of functions is extremely flexible and can fit all kinds of data very efficiently I would argue it makes it harder for those functions to have correct values out of distribution.

leereeves a year ago

Definitely. That's a fundamental observation called the bias-variance tradeoff. More flexible models are prone to overfitting, hitting each training point exactly with wild gyrations in between.
Big AI minimizes that problem by using more data. So much data that the model often only sees each data point once and overfitting is unlikely.
- ziofill a year ago
  
  But while keeping the data constant, adding more and more parameters is a strategy that works, so what gives? Are the functions getting somehow regularized during training so effectively you could get away with fewer parameters, it's just that we don't have the right model just yet?

eru a year ago

Sorry, when I meant 'shape' of the function, I meant the shape of the abstract syntax tree (or something like that).

Not the shape of its graph when you draw it.

refulgentis a year ago

More directly than my first attempt: you're continuing the error here. The nave's approach of "it's approximating some function" both maps to reality and makes accurate predictions. The more we couple ourselves to "no no no, it's modeling a precise function", the more we end up wrong, both on how it works in theory and in practice.
- eru a year ago
  
  Huh? Who says anything about 'precise functions'? And what's a precise function in the first place?
  I am saying that training (at least for conventional neural nets) only fiddles with some parameters. But it does not change the shape of the network, no new nodes nor different connections. (Which is almost equivalent to saying training doesn't change the abstract syntax tree, if you were to write the network out as a procedure in, say, Python.)
  The geometric shape you get when you print out the function changes, yes.