abhgh 4 days ago

This is the definitive reference on the topic! I have some notes on the topic as well, if you want something concise, but that doesn't ignore the math [1].

[1] https://blog.quipu-strands.com/bayesopt_1_key_ideas_GPs#gaus...

  • C-x_C-f 4 days ago

    These are very cool, thanks. Do you know what kind of jobs are more likely to require Gaussian process expertise? I have experience in using GP for surrogate modeling and will be on the job market soon.

    Also a resource I enjoyed is the book by Bobby Gramacy [0] which, among other things, spends a good bit on local GP approximation [1] (and has fun exercises).

    [0] https://bobby.gramacy.com/surrogates/surrogates.pdf

    [1] https://arxiv.org/abs/1303.0383

    • abhgh 4 days ago

      Aside from secondmind [1] I don't know of any companies (only because I haven't looked)... But if I had to look for places with strong research culture on GPs (I don't know if you're) I would find relevant papers on arxiv and Google scholar, and see if any of them come from industry labs. If I had to take a guess on Bayesian tools at work, maybe the industries to look at would be advertising and healthcare.I would also look out for places that hire econometricists.

      Also thank you for the book recommendation!

      [1] https://www.secondmind.ai/

timdellinger 2 days ago

My take is that the Rasmussen book isn't especially approachable, and that this book has actually held back the wider adoption of GPs in the world.

The book has been seen as the authoritative source on the topic, so people were hesitant to write anything else. At the same time, the book borders on impenetrable.

heinrichhartman 2 days ago

Why would you learn Gaussian Processes today? Is there any application where they are still leading and have not been superseeded by Deep NNets?

  • hodgehog11 2 days ago

    I would argue there are more applications overall where Gaussian processes are superior, as most scientific applications have smaller data sets. Not everything has enough data to take advantage of feature learning in NNs. They are generally reliable, interpretable, and provide excellent uncertainty estimates for free. They can be made to be multiscale, achieving higher precisions as a function approximator than most other methods. Plus, they can exhibit reversion to the prior when you need that.

    Another example where it is used is for emulating outputs of an agent-based model for sensitivity analyses.

  • roadside_picnic 2 days ago

    Basically they're incredibly useful for any situation where you have "medium" data where you don't have enough data to properly train a NN (which are very data hungry in practice) but enough data that you're not really exploiting all the information using a more traditional approach.

    GPs essentially allow you to get a lot of the power of a NN while also being able to encode a bunch of domain knowledge you have (which is necessary when you don't have enough data for the model to effectively learn that domain knowledge). On top of that, you get variance estimates which are very important for things like forecasting.

    The only real draw back to GPs is that they absolutely do not fit into the "fit/predict" paradigm. Properly building a scalable GP takes a more deeper understanding of the model than most cases. The mathematical foundations required to really understand what's happening when you train a sparse GP greatly exceed what is required to understand a NN, and on top of that there is a fair amount of practical insight into kernel development that is required as well. But the payoff is fantastic.

    It's worth recognizing that, once you realize that "attention" is really just kernel smoothing, transformers are essentially learning sophisticated stacked kernels, so ultimately share a lot in common with GPs.

  • cjbgkagh 2 days ago

    AFAIK state of the art is still a mix of new DNN and old school techniques. Things like parameter efficiency, data efficiency, runtime performance, and understandability would factor into the decision making process.

  • timdellinger 2 days ago

    Bayesian optimization of, say, hyperparameters is the canonical modern usage in my view, and there are other similar optimization problems where it's the preferred approach.

  • xpe 2 days ago

    To reduce the risk of being a lemming. It is in everyone's interests for some people not to follow the herd / join the plague of locusts.

  • ysaatchi 2 days ago

    you can combine deep NNets with GPs, e.g. here https://arxiv.org/abs/1511.02222

    So it isn't a matter of which is better. If you ever need to imbue your deep nets with good confidence estimates, it is definitely worth checking out.

memming 2 days ago

Stationary GPs are just stochastic linear dynamical systems. (Not just the Matern covariance kernel)