Tbh, I'm really not sure why something like this should take 15 seconds to compute. That's roughly a few trillion floating point ops for a problem that has been solvable for decades. I have trouble imagining any reasonable model for mapping price -> sales needing that much compute.
Also, fwiw, I really wouldn't expect clients slider clicks to follow a normal distribution. A normal distribution occurs when you have the sum of a large number of random variables with Finite (and bounded) expectation and variance; or alternatively, when you're modelling a process with known Expectation and Variance, but not any higher order moments. If anything, I'd expect human beings to play with the slider more around extremal points, like the start and end.
Yeah, this is like if you asked your friend how much more productive they think coffee makes them, and they replied with a four hundred thousand degree polynomial over the milliliter. Reality is never that predictable; reasonable people can recognize when they have run out of significant digits. Something has gone severely wrong in your data-modeling, your invocation of the model, or both. If it actually is doing compute for 15s, then to the extent that this works, it is wrapping a vastly simpler function, which I would suggest you graph and use going forward instead. It will save you the runtime, its outputs will be more reliable, and you will get actual insights.
My first instinct would be to do a one or two in the middle then both of the extremes. The assumption that it would be a normal distribution is so strange to me in this situation.
This whole article reads like it was written by someone with no ability to step back for a second and think of other much easier solutions. They just go all in on the first thing they think of even when it is not effective at all.
Just increase the increment size, or if you really want 1c increments you could precompute every 5c or so and then just do linear interpolation between them.
Linear interpolation on small intervals is like, a model of a model. And that’s exactly what differentiable functions are, anyway. And if you want to be fancier then sample the model and fit some polynomials to interpolate between those samples.
If they were really time constrained they could precompute things sparsely first (for a demo the next day) and then progressively refine the values in between.
This is one of those junior engineer moments where they technically did perceive a problem and solve it, but you wish they had just come and asked for some advice first.
Well the details in the article are sparse, but given what we are told, it seems highly likely that instead of using their ML model directly, they could use their ML model to fit a regression or a piecewise polynomial (eg a linear interpolation or spline) over the result. So the user input is not driving the ML model it is simply an input into a polynomial giving a calculation that is trivial for a modern computer.
Then they wouldn’t even need to cache anything and the result would be instantaneous with no real loss of accuracy.
What is the actual model that takes 15 seconds to compute?
If I understand the setting, you are estimating the demand curve for a given price... And there are only 40 such curves to compute.
Surely each curve is fit with only a few parameters, probably fewer than five. (I think for small price ranges the demand curve is usually approximated as something trivial like y=mx+b or y=a/(x+b)+c)
Why does evaluating the model at a particular price need to take milliseconds, yet alone 15 seconds?
This is the right question. The article reads like answering a XY-problem, where instead focusing on the actual issue, the author triples down on polishing a turd.
It's odd that OP didn't seem to consider applying the nearest cached value for any given slider stop.
The Gaussian frequency was a cool idea, however.
Also, I would speculate that projected sales would likely be a continuous function in most cases, so I'm curious why they didn't try fitting a function based on initial results.
Ah, good point. To be honest, interpolation didn't even cross my mind.
The model output wasn't just one number, it was a messy JSON with a 12-week forecast. Trying to average two of those felt like a whole other task, and with the deadline, my brain was just stuck on how to pick the right numbers to cache.
But yeah, it's a really great idea. Will definitely keep it in mind for the next demo.
I have this problem all the time and I can usually run the calcs in a simple multithreaded process pool/queues. While each calc may still take 15s, I run a dozen or more at a time. This helps me refresh the cache in a reasonable amount of time, doesn’t really focus on improving calc speed of the underlying service which is obviously another potential opportunity
It's cool that you got it working in time for the demo, but I think your reasoning is unsound.
>I remembered this from my engineering days at the College of Engineering, Trivandrum. It’s called the Normal Distribution, or Gaussian distribution. It’s based on the idea that data near the mean (the average) is more frequent than data far from the mean.
There are a lot of non-normal distributions where that's the case. The normal distribution is a specific thing that arises from summing together lots of small random variables.
It's not a good model of people moving sliders on a UI: a person's decision to set the value to e.g. 0.8 is really one discrete thing, not a sum of a bunch of independent micro-decisions. There's no physical/statistical law preventing someone from grabbing the slider and thrusting all the way to the left or the right, and in fact people do this all the time on UIs. The client can move the slider however he pleases ...
So I think you just got lucky that the client didn't do that. Don't rely on it not happening in the future!
(You could also imagine fitting a normal distribution to user behaviour, but it turns out the standard deviation is just really large. That would be technically defensible but also useless for your situation, since there would be substantial probability at the min/max values of the finite range. It would be close to uniform.)
(Also, who's to say the mean is in the middle of the slider range?)
Anyway I'm curious what the ML model was doing that took 15 seconds. Are you sure there's no way to speed it up?
Tbh, I'm really not sure why something like this should take 15 seconds to compute. That's roughly a few trillion floating point ops for a problem that has been solvable for decades. I have trouble imagining any reasonable model for mapping price -> sales needing that much compute.
Also, fwiw, I really wouldn't expect clients slider clicks to follow a normal distribution. A normal distribution occurs when you have the sum of a large number of random variables with Finite (and bounded) expectation and variance; or alternatively, when you're modelling a process with known Expectation and Variance, but not any higher order moments. If anything, I'd expect human beings to play with the slider more around extremal points, like the start and end.
Yeah, this is like if you asked your friend how much more productive they think coffee makes them, and they replied with a four hundred thousand degree polynomial over the milliliter. Reality is never that predictable; reasonable people can recognize when they have run out of significant digits. Something has gone severely wrong in your data-modeling, your invocation of the model, or both. If it actually is doing compute for 15s, then to the extent that this works, it is wrapping a vastly simpler function, which I would suggest you graph and use going forward instead. It will save you the runtime, its outputs will be more reliable, and you will get actual insights.
Also very curious about what kind of model this is and how it could (so far as it sounds) take 100% of the hardware for 15 seconds per request.
My first instinct would be to do a one or two in the middle then both of the extremes. The assumption that it would be a normal distribution is so strange to me in this situation.
This whole article reads like it was written by someone with no ability to step back for a second and think of other much easier solutions. They just go all in on the first thing they think of even when it is not effective at all.
Just increase the increment size, or if you really want 1c increments you could precompute every 5c or so and then just do linear interpolation between them.
Yeah dude, seriously…
Linear interpolation on small intervals is like, a model of a model. And that’s exactly what differentiable functions are, anyway. And if you want to be fancier then sample the model and fit some polynomials to interpolate between those samples.
If they were really time constrained they could precompute things sparsely first (for a demo the next day) and then progressively refine the values in between.
Why did this trend on HN?
This is one of those junior engineer moments where they technically did perceive a problem and solve it, but you wish they had just come and asked for some advice first.
Don’t dangle the man - enrich him with your advice!
Well the details in the article are sparse, but given what we are told, it seems highly likely that instead of using their ML model directly, they could use their ML model to fit a regression or a piecewise polynomial (eg a linear interpolation or spline) over the result. So the user input is not driving the ML model it is simply an input into a polynomial giving a calculation that is trivial for a modern computer.
Then they wouldn’t even need to cache anything and the result would be instantaneous with no real loss of accuracy.
What is the actual model that takes 15 seconds to compute?
If I understand the setting, you are estimating the demand curve for a given price... And there are only 40 such curves to compute.
Surely each curve is fit with only a few parameters, probably fewer than five. (I think for small price ranges the demand curve is usually approximated as something trivial like y=mx+b or y=a/(x+b)+c)
Why does evaluating the model at a particular price need to take milliseconds, yet alone 15 seconds?
This is the right question. The article reads like answering a XY-problem, where instead focusing on the actual issue, the author triples down on polishing a turd.
It's odd that OP didn't seem to consider applying the nearest cached value for any given slider stop.
The Gaussian frequency was a cool idea, however.
Also, I would speculate that projected sales would likely be a continuous function in most cases, so I'm curious why they didn't try fitting a function based on initial results.
OP here,
Ah, good point. To be honest, interpolation didn't even cross my mind.
The model output wasn't just one number, it was a messy JSON with a 12-week forecast. Trying to average two of those felt like a whole other task, and with the deadline, my brain was just stuck on how to pick the right numbers to cache.
But yeah, it's a really great idea. Will definitely keep it in mind for the next demo.
I would have just had the slider move in increments of $0.10 or $0.05.
If I was a user, I might be confused why the increment is not consistent across the whole range.
Maybe I'm just confused, but why even use an ML model for this? It's all just calc, right?
I have this problem all the time and I can usually run the calcs in a simple multithreaded process pool/queues. While each calc may still take 15s, I run a dozen or more at a time. This helps me refresh the cache in a reasonable amount of time, doesn’t really focus on improving calc speed of the underlying service which is obviously another potential opportunity
So parallel precomputation wasn't an option?
I „solved” it by ignoring a part of the problem?
Please don’t take this personal, but IMHO that’s not solving the problem. That’s coming up with a workaround.
Unless I'm misunderstanding, seems like the approach taken here (normal distribution and Monte Carlo simulation) might achieve similar?
https://filiph.github.io/unsure/
"How I scammed my client in one weekend" would be more exact
I had a 7-day compute problem, 3 days to solve it, and no extra hardware. Here's what worked.
It's cool that you got it working in time for the demo, but I think your reasoning is unsound.
>I remembered this from my engineering days at the College of Engineering, Trivandrum. It’s called the Normal Distribution, or Gaussian distribution. It’s based on the idea that data near the mean (the average) is more frequent than data far from the mean.
There are a lot of non-normal distributions where that's the case. The normal distribution is a specific thing that arises from summing together lots of small random variables.
It's not a good model of people moving sliders on a UI: a person's decision to set the value to e.g. 0.8 is really one discrete thing, not a sum of a bunch of independent micro-decisions. There's no physical/statistical law preventing someone from grabbing the slider and thrusting all the way to the left or the right, and in fact people do this all the time on UIs. The client can move the slider however he pleases ...
So I think you just got lucky that the client didn't do that. Don't rely on it not happening in the future!
(You could also imagine fitting a normal distribution to user behaviour, but it turns out the standard deviation is just really large. That would be technically defensible but also useless for your situation, since there would be substantial probability at the min/max values of the finite range. It would be close to uniform.)
(Also, who's to say the mean is in the middle of the slider range?)
Anyway I'm curious what the ML model was doing that took 15 seconds. Are you sure there's no way to speed it up?