> The thing is, while the odd application crash due to memory exhaustion probably doesn’t bother some, it certainly bothers me. Do we really trust that applications will reliably save necessary state at all times prior to crashing due to a malloc failure? Are we really ok with important system processes occasionally dying, with system functionality accordingly affected? Wouldn’t it be better if this didn’t happen?
With virtual memory and paging, it's really up to the user what's too taxing on their system. And it's not an either-or, I greatly value an application that reliably saves its state consistently and often. Sublime Text is fantastic, I don't even have to press Save and can just pull the plug on the machine. This mitigates the allocation failure case as well as so many other failures.
I'm trying to remember when I had memory limit problems. One kind is for Java/Spring services I was working on having memory 'leaks' in query template caches. There were also Linux OOMs, probably also caused by trying to run too many services on one box or one of them processing large inputs.
I can't recall a time when I thought if this app/service had better handled allocation failure, things would be great. Being out of memory from the operation was equivalent to me as a nice error message that it could resume from not having done the operation. It would be like a client getting an HTTP 502 from a load-balancer and the service invisibly restarting. Maybe it's my cattle not pets attitude to expect failures.
> It would be like a client getting an HTTP 502 from a load-balancer and the service invisibly restarting.
If the load-balancer can't gracefully handle malloc failure than instead of a 502 response, hundreds, thousands, or perhaps even millions of clients will simply get EOF or possibly even hang indefinitely.
Same principle applies to the kernel--if the kernel can't gracefully handle malloc failure, things can quickly become much more unpleasant than a spurious 502 somewhere.
If you've never had to worry about malloc failure, it's because other people have. Failing gracefully under pressure is difficult, so we tend to push those chores into a small number of software and hardware services. But the people writing those solutions still need the software stack beneath them to provide the ability to handle things as cleanly as possible.
Imagine writing an ACID database if a failed disk operation crashed the entire machine. Yes, you can work around it, but actual QoS would suck at scale compared to being able to isolate the fallout to the particular transaction.
Linux' overcommit absolutely has caused me and others countless hours of grief. Because overcommit blunts, obscures, and redirects memory back pressure, things tend to soft fail or timeout in cascades across completely unrelated services, and you really have very little ability to control it in any meaningful manner.
I should have been more clear, if my service crashed and was located behind a load-balancer the client would receive a 502 Bad Gateway response. That's based on the premise that I'm not writing the load-balancer and that it's been well tested and doesn't suffer from ungraceful handling of malloc failure. So yes it's important for a small subset writing things like load-balancers.
I had a tiny business web site running a few 100 maybe 1000 http requests per minute, either static or doing minimal php/mysql work. So I hired a cloud machine of 512MB because shoestring budget. Resized a few memory pools.
It worked great, until the mysql dataset grew enough. There was a backup job running 7z and pulling the compressed DB to a storage machine. It turns out 7z crashed because of the OOM killer.
The http service itself just kept going on, OOM or not. Presumably because backup ran at 3AM and almost nobody was using it at that time.
The most annoying thing that apps do (even freaking vectors in c++) is that they do not return the memory to the system after they are done with a big task. Rather they will keep the memory just in case in the future they need it again.
No the unused memory is not yours to manage. Return it to the OS.
> The thing is, while the odd application crash due to memory exhaustion probably doesn’t bother some, it certainly bothers me. Do we really trust that applications will reliably save necessary state at all times prior to crashing due to a malloc failure? Are we really ok with important system processes occasionally dying, with system functionality accordingly affected? Wouldn’t it be better if this didn’t happen?
With virtual memory and paging, it's really up to the user what's too taxing on their system. And it's not an either-or, I greatly value an application that reliably saves its state consistently and often. Sublime Text is fantastic, I don't even have to press Save and can just pull the plug on the machine. This mitigates the allocation failure case as well as so many other failures.
I'm trying to remember when I had memory limit problems. One kind is for Java/Spring services I was working on having memory 'leaks' in query template caches. There were also Linux OOMs, probably also caused by trying to run too many services on one box or one of them processing large inputs.
I can't recall a time when I thought if this app/service had better handled allocation failure, things would be great. Being out of memory from the operation was equivalent to me as a nice error message that it could resume from not having done the operation. It would be like a client getting an HTTP 502 from a load-balancer and the service invisibly restarting. Maybe it's my cattle not pets attitude to expect failures.
> It would be like a client getting an HTTP 502 from a load-balancer and the service invisibly restarting.
If the load-balancer can't gracefully handle malloc failure than instead of a 502 response, hundreds, thousands, or perhaps even millions of clients will simply get EOF or possibly even hang indefinitely.
Same principle applies to the kernel--if the kernel can't gracefully handle malloc failure, things can quickly become much more unpleasant than a spurious 502 somewhere.
If you've never had to worry about malloc failure, it's because other people have. Failing gracefully under pressure is difficult, so we tend to push those chores into a small number of software and hardware services. But the people writing those solutions still need the software stack beneath them to provide the ability to handle things as cleanly as possible.
Imagine writing an ACID database if a failed disk operation crashed the entire machine. Yes, you can work around it, but actual QoS would suck at scale compared to being able to isolate the fallout to the particular transaction.
Linux' overcommit absolutely has caused me and others countless hours of grief. Because overcommit blunts, obscures, and redirects memory back pressure, things tend to soft fail or timeout in cascades across completely unrelated services, and you really have very little ability to control it in any meaningful manner.
I should have been more clear, if my service crashed and was located behind a load-balancer the client would receive a 502 Bad Gateway response. That's based on the premise that I'm not writing the load-balancer and that it's been well tested and doesn't suffer from ungraceful handling of malloc failure. So yes it's important for a small subset writing things like load-balancers.
I had a tiny business web site running a few 100 maybe 1000 http requests per minute, either static or doing minimal php/mysql work. So I hired a cloud machine of 512MB because shoestring budget. Resized a few memory pools.
It worked great, until the mysql dataset grew enough. There was a backup job running 7z and pulling the compressed DB to a storage machine. It turns out 7z crashed because of the OOM killer.
The http service itself just kept going on, OOM or not. Presumably because backup ran at 3AM and almost nobody was using it at that time.
The most annoying thing that apps do (even freaking vectors in c++) is that they do not return the memory to the system after they are done with a big task. Rather they will keep the memory just in case in the future they need it again.
No the unused memory is not yours to manage. Return it to the OS.
One reason for overcommit is because of COW.
For example, you fork() then exec() from a process using 16GB of memory: without overcommit, you briefly need 32GB of memory.